29 November 2007

Keeping an eye on Catalyst

In localization, "Catalyst" is a tool from Alchemy Software. Among other things, it allows you to localize UI elements within software resource files, sometimes without the need to rebuild the software manually into binary format.

Since software binaries come from text files, part of Catalyst's value lies in straddling the divide between allowing the translator to change strings in the these text files (say, from English to Japanese) and displaying them in the binary, run-time format in which the user will see them on screen.

Last month a vendor returned some resource files to me which we had them localize from English to Japanese. I rebuilt the binaries (language-resource DLLs) and ran them. Unfortunately, a number of items were suddenly missing from the Japanese menus, so I had to troubleshoot the problem.

My first thought was that either a person or a tool (or a person using a tool) had modified something that should not be affected by the localization process. I had handed off a resource file containing these lines:

32777 MENU DISCARDABLE
BEGIN
POPUP "&Tools"
BEGIN
MENUITEM "Serial P&ort Settings...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "&Network Settings...", ID_TOOLS_NETWORK
MENUITEM "&Battery Settings...", ID_TOOLS_BATTERYSETTINGS
END
END

32779 MENU DISCARDABLE
BEGIN
POPUP "&File"
END


They returned to me a resource file containing these strings:

9 MENU DISCARDABLE
BEGIN
POPUP "ツール(&T)"
BEGIN
MENUITEM "シリアルポートの設定(&O)...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "ネットワーク設定(&N)...", ID_TOOLS_NETWORK
MENUITEM "バッテリの設定(&B)...", ID_TOOLS_BATTERYSETTINGS
END
END

11 MENU DISCARDABLE
BEGIN
POPUP "ファイル(&F)"
END

There was nothing wrong with the translation, and the string IDs were intact. The product has long been "double-byte clean," so I knew that the software was not gagging on the Japanese characters.

The problem lay in the menu ID numbers, which are 32777 and 32779 in the English, but which came back in the Japanese files as 9 and 11. The vendor believes that Catalyst changed them, since they had used it to for resizing and QA.

Normally, this renumbering has no effect on how the binary functions. In this case, however, it has a profound effect on how the binary functions, because there is code somewhere in the software that is looking for "32777" and "32779" and when it doesn't find those ID's, it cannot complete the menu. This is poor internationalization in the code base which I have discussed with Engineering, to no avail, so I need to police the resource files in each round of localization.

How is Catalyst working for you? Have you seen similar problems?

Interested in this topic? You might enjoy another article I've written called "Localized Binaries - The Plot Thickens"

Labels: , , , , , ,

22 November 2007

Have you cleaned behind your glossaries?

Don't take this question too personally. After all, I'm not asking whether you've cleaned behind your ears, or behind your couch. But last week I asked the digital question, "Where do your glossaries live?" and this week I'm asking about the state of their hygiene.

One of my client-companies is quite proud (and justifiably so) of the considerable work they did a couple of years ago in building out a 600+ entry glossary in ten languages. They (or their language vendor, really) have hosted it on the Web, with read-only access to any translator who does work for them.

This model of glossary has the inestimable benefits of being universal, up-to-date and centralized - there is only one glossary - instead of being a patchwork of spreadsheets and tables on several different hard drives in several states of accuracy. It's set up for alpha-listed browsing and search, although the search function is not fuzzy unless you use wildcards, so some translators will not derive full benefit from it and may in fact miss terms.

While managing a sample translation for the client, I wanted to export the glossary to review it all at a glance, so I mentioned that. "Nope. That's not possible," the client told me, with more than a hint of pride. "We designed it so that there would be only one glossary in one format in one place. We don't want it exported or circulated unnecessarily."

Now, I'm in business to see my clients succeed, but that kind of mindset is just a tempting challenge to me, and as I managed the sample translation I deliberately looked for reasons why a hermetically sealed glossary like this was a bad idea. Naturally, I found one: The client had not cleaned very well behind their glossary.

Several industry-specific terms occur in the sample, and I knew the translators would be obliged to use the glossary. For instance, terms like "drive" occur in various combinations ("link drive," "offset drive," "drive mechanics," "rack-mount drive," etc.) in the glossary, and as I poked from one entry to another I noticed inconsistencies and contradictions in how "drive" was translated, notably in German. One entry gave "Laufwerk" as the translation, and another entry bore the note that "'Laufwerk' is obsolete."

The online model for hosting this glossary is a good one for several reasons, but it's not amenable to the healthy, periodic scrub that such databases should undergo. If the glossary were exportable, or at least visible in row-and-column format, these inconsistencies would be easier for translators to spot and address.

Interested in this topic? You might enjoy another article I've written called "Where do your glossaries live?"

Labels: , ,

16 November 2007

Where do your glossaries live?

The experienced project manager with your localization/translation vendor approaches a new client/project by asking you, "Has this ever been translated before?" Her big goal is to discover whether there's a translation memory database floating around, to help her translators do their work more quickly and keep your costs low, and her background goal is to find existing documents with key terms already translated and approved.

Smart companies maintain these key terms in a "glossary" or terminology list. Glossaries are far less comprehensive than translation memory because they serve a slightly different purpose: Instead of proposing a fuzzy-match translation for an entire sentence, they serve as a reference for the translators. Good translators know how to find translations for generally accepted terms like "closed-loop servomechanism" and "high-definition multimedia interface," but if the sales manager in your Shanghai office has already told you how he likes to see the word translated, everybody will be happier if that preference is observed.

So where do your glossaries live?

"Live" is the important word, because glossaries change and grow with time. Most glossaries I've seen are in a spreadsheet or word processing document. While that's better than nothing, it can suffer from decentralization, since updates don't always make it to everybody involved in the project, and some translators run the risk of using old terminology.

One of my more localization-savvy clients makes its glossary available on its partner portal, requiring a login and password. The php-based application, which is actually hosted by a translation vendor, allows searching in multiple languages. My client deliberately does not make the glossary available for download or export; this ensures that everybody is using the same version with all updates.

I like this model. The assets reside on the client/owner's site, and the terminology "lives" with the linguistic experts, who can easily modify it. It's a bit more work for the translator, who would rather have a flat-file document, but overall it serves linguistic interests well. It's tried-and-true technology built in to most computer-aided translation tools.

What are you doing with your glossaries?

Labels: , , , , , , ,

09 November 2007

"Why are you charging me for that?" - Part 2

Do you have manuals, resource files, help projects or entire Web sites that you've been localizing for several years and through several versions? Have you thought about the "permafrost" in those files; i.e., the sentences, paragraphs, pages and chapters that haven't changed in ages?

Are you being charged for them in your localization efforts?

In my experience, vendor pricing includes discounts for segments (usually entire sentences or bits of text surrounded by paragraph markers) with high match rates to text that has already been translated. So, a new 30-word sentence at $.25/word may cost $7.50, but a 30-word sentence that does not change at all from one version to the next may cost $.03/word, or $.75.

But why are you charging me for that?

Vendors have different rationales (and they are welcome to post them here) which often boil down to the necessity to "touch" the words in one way or the other: either in engineering unchanged paragraphs into the new manual, or in translation memory maintenance, or in the human editing pass when eyes land upon them. These words are the spare tire of localization, in that they haven't changed, but they're still along for the ride, and moving their weight requires some modicum of additional gasoline.

As a localization manager, try explaining that to your boss.

So, if I don't want them to charge me for the words, or for any words that don't require translation, what should I do?
  • Perform your own triage. Pull out code samples, for instance, which will never need to be translated, and hand them to your vendor in a text file. Ask that they be aligned to themselves in TM so that the words fall out as 100% matched. The translators won't touch them and you won't (shouldn't) be charged for them.
  • Move to a CMS. Deploying a content management system is a long-haul solution; but if this is a long-haul problem for you, look into it. With a CMS in place and an interface between your vendor and the system, it becomes easier for you and your vendor to separate matched from non-matched segments. That which is easier, should cost less.
  • Give instructions, not words. If you suspect that there have been only a few changes to a 40-page manual, use a diffing tool to find them, write up the changes, and hand them off to the vendor with instructions to charge you hourly for the spot-changes. If the vendor knows exactly what to change where, it eliminates the guess work, the TM analysis, the file preparation and the engineering. This lowers your costs of localizing the book.
You should be able to have a calm discussion with your vendor about these jillions of unchanging words, and arrive at one or more methods for eliminating unnecessary work. Try to go beyond "Why are you charging me for that?" to "What can we do so that you don't have to charge me for that?"

Interested in this topic? You might enjoy another article I've written called "Why are you charging me for that? - Part 1"

Labels: , , , ,

02 November 2007

Amtrak.com in deutscher Sprache!

This is probably more in the domain of John Yunker, whose Global by Design site focuses on American companies coming out of their americocentric stupor, but I'll mention that Amtrak's site has been localized into Spanish and German.

This is a hot one. Passenger train travel is not exactly all the rage. The network is not expanding noticeably, and even after Antarctica melts, Americans still aren't going to get out of their cars and take a train, except to amuse their children. Why throw marketing dollars at a localized Web site?

Why Spanish? Because hundreds of thousands of Hispanic Americans need to move from city to city, and if they're going to take the train, it's easier for them to research routes and schedules in their own language. On the other hand, the railroads in Mexico, in particular, are a popular joke, and buses long ago displaced trains as the default means of intercity passenger transportation. So it seems that Amtrak sees the demographic potential, but may have some cultural baggage to overcome in attracting this new ridership, not to mention the issue of whether their sector of the Hispanic market uses the Web (yet).

Why German? Because Germans (and Austrians and Swiss) believe in the trains, I suppose. This is even more intriguing than the Spanish site, because it required more research than simply picking up the newspaper and reading that Hispanic buying power in the U.S. will have risen 347% to almost $1 trillion from 1990 to 2009 ("The Multicultural Economy, 1990-2009", from the Selig Center for Economic Growth). The move to German must have involved polling actual passengers and getting hip to the fact that these people not only think in terms of train travel, but also use the Web to research it.

Both Spanish and German sites are more than mere afterthoughts; they seem to be comprehensively translated, several levels deep. Notes:
  • They even translated "California" as "Kalifornien" in the state drop-down menus, no doubt as a nod to the governor.
  • Don't use accented characters when you enter your name on the Spanish site. The error message telling you what you did wrong and how to rectify it is still in English.
  • I don't know which credit card the Spanish- and German-speaking travelers are likely to use, but the only choices are Visa, MasterCard, AmEx, Diners and Discover. No debit cards, no PayPal.
Sometimes the mechanics of a localization project are less compelling than the story behind it. If you know the story behind the Amtrak localizations - or an offbeat story behind a project you've done - please post it here.

(Blogger's note: Travel between San Diego and Los Angeles by car has lost all of its allure, and I opt for Amtrak whenever I can. I have had multiple pleasant conversations in Spanish with people on this train route, usually people from Mexico visiting family in Southern California.)

Labels: ,