11 September 2008

Wordcount Woes - Part 1

Do you spend much time fretting about wordcount?

My hunch is that translators worry about it more than agencies do, because it's often the only metric by which translators earn their daily bread. Agencies have project management, layout, graphics, consulting, rush charges and other metrics to observe, but most translators have one line-item on their invoices: wordcount.

I suppose that we all live and die by it because everybody's calculations get down to wordcount - either source or target text - sooner or later, but no two tools define words the same way, so wordcount can vary considerably.

Still, the bigger issue with wordcount is "wordcount leakage." If you're working vendor-side, how many times have you quoted on a project, then realized that you had overlooked a chunk of text?

  • Graphics are the biggest culprit. The document contains charts and diagrams that require translation, but TM tools don't find those words. Many vendors wisely exclude such text from wordcount and cover it in an hourly or per-graphic charge. (Nobody can ever find the source files for the graphics so that you can localize them properly, but that's a whole other talk show.)
  • Bookmarked text is also slippery. It appears as text (sentences, paragraphs) in one place, and is referred into other places in the document. True, you only translate it in one place, but you need to deal with it - layout, formatting, page flow - in other places as well.
  • Conditional text, a favorite of Framemaker professionals, can also cause you trouble. If you don't calculate wordcount with the conditions set to expose all of the text, you may miss it. The author should arrange for this before handoff.
  • Embedded documents (spreadsheets, word processing, HTML, presentations) are very sneaky. We just saw this the other day with an MS Word document that contained several embedded spreadsheets visible only as 1cm square icons on the page; double-clicking the icons opened up the embedded files. The TM tools don't see those words, but the client certainly would have if they had come back untranslated. Fortunately, we caught this in time.
The Moral: Two pairs of eyes should review every file before the TM analysis, NOT one pair of eyes and a TM software package.

Labels: , ,