30 March 2007

Localization Testbenches, Part I

What are you using to test your localized products? If you're handing them to your domestic QA team and expecting that they'll intuitively test them with correct language locale settings, you may be in for an unpleasant surprise.

Of course, your testers need to have some tolerance for the extraordinary circumstance of not being able to read what they're testing. Testers with this level of tolerance have not been that easy to hire in single-language countries - which is one explanation for the success of globalization - but they do not take quite so much umbrage at it now that the writing is on the wall and the tools are more handy.

Also, there are two levels of testing: linguistic and functional. You do not need (or want) your domestic QA team to review the Italian translation; you want the translators to review it, and by the time you're handing your product to your QA team, linguistic review should be long since ended. In most cases, your QA team will know how to perform functional testing much more efficiently than the translators will, even though the UI is foreign. Encourage them to overcome the "How can I test this when I can't read it?" obstacle, either with your own evangelization, or with gentle, paycheck-indexed prodding from above. They have more value to add to the localization QA process than they suspect.

In this series, you'll read about testing 1) Software; 2) Web sites; and 3) Help files.

Labels: , , ,

23 March 2007

Localizing Declarations of Conformity

Does your documentation contain Declarations of Conformity with European Community standards? If it does, here is some due diligence you should undertake before having the docs translated.

The EC has promulgated a long series of directives on a variety of industries ranging from aerospace to toys. Some of these directives describe industrial policy and consumer protection. If your product falls into the category of those covered by a set of directives, then 1) the product must conform to the directives; and 2) you must declare that it conforms and list the directives with which it conforms.

This second requirement leads to some of the driest text with which you'll ever fill pages in a user guide; for instance:

Protection requirements concerning electromagnetic compatibility to Article 3(1)(b)

Harmonised standards applied:

EN 301-489-1, V1.4.1 (2002-08); Electromagnetic compatibility and Radio spectrum Matters (ERM); Electromagnetic Compatibility (EMC) Standard for Radio Equipment and Service. Part 1: Common technical requirements

ETSI EN 301 489-25 V2.2.1 (2003-05)

Fascinating reading. And, it makes for even more fascinating translation work.

If you're localizing your U.S. product for sale in Germany, the translation of the names of these standards with which you're declaring conformity should match the German names acknowledged by the EC. You could hand off the English text to a German translator, who could trip through several technical dictionaries creating his own translation. The numbers of the directives would be correct (because not translated), but strictly speaking, the titles would not be correct, unless your translator was extremely lucky.

Fortunately, the EC has made this easy. Depending on the industry, they offer accepted translations of the titles and text of the directives in as many as twenty languages on their Web site. With a bit of digging, your translators can find and re-use approved text. This will not only save them (and you) time, but will ensure you of a better fit for your localized documentation.

Labels: ,

16 March 2007

How to pseudo-translate, Part II

You only speak one language, so maybe you'll never be a translator, but you have a chance as a pseudo-translator.

Pseudo-translation is the process of replacing or adding characters to your software strings to try and break the software, or at least uncover strings that are still embedded in the code and need to be externalized for proper localization. (Part I of this post describes why anybody would want to do such a thing.) Pseudo-translation is a big piece of internationalization (I18n), which you should undertake before you bother handing anything off to the translators.

Here's an example of a few strings from a C resource file, with their respective, pseudo-translations:

IDS_TITLE_OPEN_SKIN "Select Device"
IDS_TITLE_OPEN_SKIN "日本Sイlイct Dイvウcイ本日"

IDS_MY_FOLDER "Directory:"
IDS_MY_FOLDER "日本Dウrイctエrユ:本本"

IDS_MY_OPEN "&Open"
IDS_MY_OPEN "日本&Opイn日"

IDS_WINDOW_NOT_ENOUGH_MEM
"Windows has not enough memory. You may lower the heap size specified in the configuration file."
IDS_WINDOW_NOT_ENOUGH_MEM
"日本Wウndエws hアs nエt イnエオgh mイmエrユ. Yエオ mアユ lエwイr thイ hイアp sウzイ spイcウfウイd ウn thイ cエnfウgオrアtウエn fウlイ.本日本日日本本本日日本日日本日本日本日本"

IDS_TARGET_INITIALIZATION_FAILED
"Failed to load or initialize the target."
IDS_TARGET_INITIALIZATION_FAILED
"日本Fアウlイd tエ lエアd エr ウnウtウアlウzイ thイ tアrgイt.日日本日本日本本"

In these strings, Japanese characters have been pushed in to replace the vowels in all English words. The goal of using Ja characters is to ensure that, when compiled, the strings will look and behave as they should under Windows Japanese; it's important to pseudo-translate with the right result in mind.

Some observations:
  1. Each string begins with Ja characters, since that will be the case in the real Japanese translation, and it's a situation worth testing.
  2. Each string contains enough English characters to allow the tester to "gist" the string from the context. This is helpful because pseudo-translation can often destroy the meaning of the string.
  3. Each string has a ratio of swell, with trailing characters adding 20% to the length of the string. This helps flush out fields and controls in which strings will be truncated.
Okapi Rainbow is an excellent (if somewhat inscrutable) text-manipulation utility for just this purpose. When run on all of the string files in the development project, the result is a set of resources which, when recompiled, will run as a pseudo-translated binary. With a testbench running the appropriate operating system, a tester can get a good idea of the I18n work in store for the developers.

Rare is the product that passes pseudo-translation testing on the first try, either because of strings left behind in the code, resizing issues, string truncation, buffer overflows, or just plain bad luck.

Even if your code isn't perfect, though, look on the bright side: You're now a pseudo-translator.

Labels: , , , , ,

13 March 2007

Localization versus Internationalization

Do you know the difference between localizing and internationalizing? Most people can go their entire lives without knowing it, but there is a useful distinction.

You localize your product when you customize it to the needs of a particular market, usually on geographic and linguistic bounds. So if you're Toyota, and you want to sell cars in California, you need to equip them with the more robust emission control systems required by law there. Or, if you want to sell them in UK, you need to build them with the steering wheel on the right-hand side.

What you soon discover, though, is that it's prohibitively expensive to create and maintain separate production lines for California, UK and other foreign markets, not to mention your domestic market. The challenge then becomes to properly internationalize your product so that the changes needed for each market cause as little disruption as possible to your production process. You design your cars so that there is some irreducible core that is the same, no matter where the car will be sold.

This is easier said than done, but in a technology company, where production lines move extremely fast - because it's all just software - it's important to bite the internationalization bullet early and often. The alternatives are multiple, straggler code lines; separate Web sites; and unintegrated, unwieldy sets of documentation.

So, when all about you are losing their heads over localization (L10n), you can be the calm voice of reason, asking whether the product is really ready for localization yet, or whether your colleagues should pause and think internationalization (I18n) first.

(Note: The term g l o b a l i z a t i o n is also used to describe the overall process of creating products for worldwide markets. The problem, in our search-engine-optimized day and age, is that the same term applies to the assimilation of national character and identity into the worldwide market, with undesirable ramifications to the disenfranchised. It's not wrong to talk about g l o b a l i z i n g your product, but if you go searching for information on localization, don't use the g-word.)

Labels: , ,

09 March 2007

Localization tail wags dog

Most of the time, when you're a U.S.-based company, you run the localization show. It's a matter of simple history: You created the product in the U.S. and pushed it out to other regions, so your domestic needs get met first. You may factor in their requirements so that they have a respectably localized product to sell, but by and large, you call the shots.

Not only that, but assume that your domestic sales (i.e., the sales for which you don't need to perform any localization) contribute 75% of your profit, and sales to regions requiring localization contribute the remaining 25%. This means you have both history and profitability in favor of your running the show.

Suppose, however that the tail wags the dog. Suppose that your product is developed in Egypt or Liechtenstein or Hungary, where it has 95% of the market without really trying, and that the developers are insensitive to the need for a properly run localization effort. The product is receiving strong uptake in the U.S., where sales will soon overtake those in the home market, but it desperately needs some localization (into proper English), on which Engineering refuses to place much priority. You have profitability on your side, but the mothership has history, not to mention ownership of development.

How do you build a localization strategy around that?

Like any global business problem, the usual bromides of communication, onsite visits in both directions and strongly backed business cases go a long way towards solving this. If they were selling a U.S.-created product into their regions, they'd defer to U.S. preferences, but it's not that simple when the scheme is suddenly inverted like this.

You feel as though you're the dog, and the other guys are the tail wagging you, and the other guys think they're the dog, and you're the tail trying to wag them.

The real winners? Your competitors.

Labels: , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,

02 March 2007

Translation non-savings, Part II

Again I ask: How far will you go to improve your localization process? If a big improvement didn't save any obvious money, would your organization go for it?

I selected a sample of 180 files. In one set, I left all of the HTML tags and line-wrapping as they have been; in the other set, I pulled out raw, unwrapped text without HTML tags. My assumption was that the translation memory tools would find more matches in the raw, unwrapped text than in the formatted text.

I cannot yet figure out how or why - let alone what to do about it - but the matching rate dropped as a result of this experiment.























Original HTML Formatting and TagsUnwrapped, unformatted text
100% match and Repetitions65%51%
95-99% match9%14%
No match9%15%

This is, as they say in American comedy, a revoltin' development. It means that the anticipated savings in translation costs won't be there - though I suspect that the translators themselves will spend more time aligning and copy-pasting than they will translating - and that I'll have to demonstrate process improvement elsewhere. If I can find an elsewhere.


True, the localization vendor will probably spend less time in engineering and file preparation, but I think I need to demonstrate to my client an internal improvement - less work, less time, less annoyance - rather than an external one.

Labels: , , , , , ,