26 February 2007

Translation non-savings, Part I

How far will you go to improve your localization process?

Because of how localization is viewed in many companies, the best improvements are the ones that lower cost. Low cost helps keep localization inconspicuous, which is how most companies want it.

But if a big improvement didn't save any obvious money, would your organization go for it?

Elsewhere in this blog I relate the saga of the compiled help file with 3500+ HTML pages in it. These pages come from a series of Perl scripts that we run on the header files to extract all of the information about the product's API and wrap it up in a single, indexed, searchable CHM. In a variety of experiments, we've sought to move the focus of translation from the final HTML files to a point further upstream, at or near the header files themselves. If the raw content were translated, we believe, all downstream changes in the Perl scripts, which get revised quite often, would be imposed automatically on the localized CHM.

One of the biggest cost items - we have suspected - is due to changes in line wrapping and other HTML variations that confuse TM tools into thinking that matches are fuzzier than they really are. The false positives look like untranslated words when analyzed, so the wordcounts rise, and not in our favor.

"If we work with raw text, before HTML formatting," our thinking goes, "the match rate will rise."

Not.

I'll describe my experiment shortly.

Labels: , , , , ,