30 October 2008

Wordcount Woes - Part 3

How about those engineers who are certain that all of the strings have been externalized from the code?

I don't know about you, but I stopped believing them a long time ago.

Pseudo-translating the code is the definitive way of showing them the strings they've missed. It requires a bit of time and, frankly, some cooperation from the very engineers you're about to embarrass, but it's the sure way to find strings still embedded in code.

Many engineers overlook the installer, also. There is usually a script or value-pair file with custom strings, and it's easy to forget to externalize strings to the file. It's also easy to specify the wrong encoding for the file, such that all of the custom strings show up corrupted in the installer. We see that a lot with InstallShield projects.

Mind you, I'm never out to get the engineers - I need them too dearly - but they sometimes get to believing their own stuff and thinking that internationalization (I18n) is kaltes Kaffee, or yesterday's news. It is yesterday's news, but that doesn't mean it's unimportant.

Where do you find strings that engineers overlook?

Note: I've posted less frequently of late because I'm between projects (and setting up another blog). Once L1on activity resumes with a couple of clients, I'll have more war stories again.

Labels: , ,

24 July 2008

"I can quit smoking whenever I want to."

"...I just don't want to."

Have you heard that one before? I heard something similar last week from a director of engineering:

"All of our strings are embedded in source code. This is deliberate, and we planned it very carefully."

How would you have reacted?

At first, I figured he was pulling my leg ("taking the mickey," "having me on," etc.). Then he explained the process of localizing strings in the gnu gettext model, which can live peacefully without external resources.

A line of code reading

result = wx.MessageDialog(_("Welcome to my blog. Today is %s"), date.today)

uses the _ function in the English context as an identity function. In a localized context it will load the language pack built using the gnu gettext utilities and map the English strings to the localized equivalent:

"Welcome to my blog. Today is %s" -> "Bienvenido a mi blog. Hoy es %s"

To redeem what seems like shortsightedness in allowing developers to embed strings in code, these utilities also contain scripts that can pull out all the English strings from source code and make localization packages, which translators can work on without danger of touching the code. Other scripts can push the localized strings back into place.

Like .properties files in Java and .rc files in C++, these localization packages isolate non-code elements for easy localization. However, a programmer's coding mistake could still result in strings going undetected by the scripts, so I still plan to perform pseudo-translation and internationalization testing on this software as soon as possible.
Just in case the director of engineering can't quit smoking as easily as he thinks he can.

Labels: , , ,

19 June 2008

Giant Localization Leap Backwards

"All of the strings are embedded in the code."

There was a time when I welcomed - or at least was not very much surprised by - sentences like this one. They came from engineers in response to my questions about the readiness of their software strings to be localized. Strings embedded in code, of course, are more or less inaccessible to localization techniques, since nobody wants to hand off an entire code base to a translator, and no translator wants to wade through an entire code base trying to find strings to translate.

So, when one of my client's engineers said it to me yesterday in reference to an application in a larger product we plan to localize, I briefly welcomed it. It means more work.

But then I realized that combing all of the strings out of the code and into separate, accessible files will require a great deal of time and effort (not mine). Engineers don't usually enjoy working on this kind of task, so it will fall to the bottom of the priority stack, and the product manager won't go to bat for it, and so this particular application will stick out like a sore thumb as a non-localized component in an otherwise localized product suite.

"Is there a phased approach we could take to enabling this app for localization?" the engineer asked.

I appreciated his attempt to save the game, but a partially localized product is rather ugly. We could enable and translate the menu and dialog strings for this release, and go back for the error messages in the next release, but the mongrel product is not very appealing to users in the meantime.

This is disappointing, because we've made such long localization-strides elsewhere in the product suite, and dealing with this newly acquired app feels like such a giant leap backwards. I guess I'll work up some estimates on the time required to enable the application, then make my case to the product manager and development lead to generate some interest and start the process from the beginning.

Isn't that why we localization project managers and international product managers were sent here?

What do you do in your company when engineers tell you that all the strings are embedded in the code?

Labels: , , , , , , ,

29 November 2007

Keeping an eye on Catalyst

In localization, "Catalyst" is a tool from Alchemy Software. Among other things, it allows you to localize UI elements within software resource files, sometimes without the need to rebuild the software manually into binary format.

Since software binaries come from text files, part of Catalyst's value lies in straddling the divide between allowing the translator to change strings in the these text files (say, from English to Japanese) and displaying them in the binary, run-time format in which the user will see them on screen.

Last month a vendor returned some resource files to me which we had them localize from English to Japanese. I rebuilt the binaries (language-resource DLLs) and ran them. Unfortunately, a number of items were suddenly missing from the Japanese menus, so I had to troubleshoot the problem.

My first thought was that either a person or a tool (or a person using a tool) had modified something that should not be affected by the localization process. I had handed off a resource file containing these lines:

32777 MENU DISCARDABLE
BEGIN
POPUP "&Tools"
BEGIN
MENUITEM "Serial P&ort Settings...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "&Network Settings...", ID_TOOLS_NETWORK
MENUITEM "&Battery Settings...", ID_TOOLS_BATTERYSETTINGS
END
END

32779 MENU DISCARDABLE
BEGIN
POPUP "&File"
END


They returned to me a resource file containing these strings:

9 MENU DISCARDABLE
BEGIN
POPUP "ツール(&T)"
BEGIN
MENUITEM "シリアルポートの設定(&O)...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "ネットワーク設定(&N)...", ID_TOOLS_NETWORK
MENUITEM "バッテリの設定(&B)...", ID_TOOLS_BATTERYSETTINGS
END
END

11 MENU DISCARDABLE
BEGIN
POPUP "ファイル(&F)"
END

There was nothing wrong with the translation, and the string IDs were intact. The product has long been "double-byte clean," so I knew that the software was not gagging on the Japanese characters.

The problem lay in the menu ID numbers, which are 32777 and 32779 in the English, but which came back in the Japanese files as 9 and 11. The vendor believes that Catalyst changed them, since they had used it to for resizing and QA.

Normally, this renumbering has no effect on how the binary functions. In this case, however, it has a profound effect on how the binary functions, because there is code somewhere in the software that is looking for "32777" and "32779" and when it doesn't find those ID's, it cannot complete the menu. This is poor internationalization in the code base which I have discussed with Engineering, to no avail, so I need to police the resource files in each round of localization.

How is Catalyst working for you? Have you seen similar problems?

Interested in this topic? You might enjoy another article I've written called "Localized Binaries - The Plot Thickens"

Labels: , , , , , ,

24 August 2007

Right hand and left hand in localization

When was the last time upper management was more enthusiastic about localization than you were ready for? Would you like to have that problem? Are you sure?

A colleague at a very large IT company wrote me that he started as localization manager four months ago, with the direction-from-above of taking his division's products into 5-7 languages this year and twice as many in the next couple of years. Who wouldn't want a charter like that? Most of us would sell our souls to Voldemort for it.

He's discovering to his dismay that most of the software hasn't been internationalized - lots of UI still embedded in code - which will drag out the timeline quite a bit. Instead of managing localization process, he'll spend the foreseeable future managing architecture, internationalization and product. He was ready to jump on the localization horse and ride it into the sunset, but is now finding out that the horse hasn't been born yet. It's parents have barely met, for that matter.

That's the result of the right hand not knowing what the left hand is doing: the right hand promises, and the left hand has to deliver. It's not uncommon in most organizations, but in this case it's overseas offices and customers who will suffer, as if they weren't already marginalized enough.

My colleague laments, "This isn't the 9-to-5 job I was expecting."

As if.

Labels: , , ,

16 March 2007

How to pseudo-translate, Part II

You only speak one language, so maybe you'll never be a translator, but you have a chance as a pseudo-translator.

Pseudo-translation is the process of replacing or adding characters to your software strings to try and break the software, or at least uncover strings that are still embedded in the code and need to be externalized for proper localization. (Part I of this post describes why anybody would want to do such a thing.) Pseudo-translation is a big piece of internationalization (I18n), which you should undertake before you bother handing anything off to the translators.

Here's an example of a few strings from a C resource file, with their respective, pseudo-translations:

IDS_TITLE_OPEN_SKIN "Select Device"
IDS_TITLE_OPEN_SKIN "日本Sイlイct Dイvウcイ本日"

IDS_MY_FOLDER "Directory:"
IDS_MY_FOLDER "日本Dウrイctエrユ:本本"

IDS_MY_OPEN "&Open"
IDS_MY_OPEN "日本&Opイn日"

IDS_WINDOW_NOT_ENOUGH_MEM
"Windows has not enough memory. You may lower the heap size specified in the configuration file."
IDS_WINDOW_NOT_ENOUGH_MEM
"日本Wウndエws hアs nエt イnエオgh mイmエrユ. Yエオ mアユ lエwイr thイ hイアp sウzイ spイcウfウイd ウn thイ cエnfウgオrアtウエn fウlイ.本日本日日本本本日日本日日本日本日本日本"

IDS_TARGET_INITIALIZATION_FAILED
"Failed to load or initialize the target."
IDS_TARGET_INITIALIZATION_FAILED
"日本Fアウlイd tエ lエアd エr ウnウtウアlウzイ thイ tアrgイt.日日本日本日本本"

In these strings, Japanese characters have been pushed in to replace the vowels in all English words. The goal of using Ja characters is to ensure that, when compiled, the strings will look and behave as they should under Windows Japanese; it's important to pseudo-translate with the right result in mind.

Some observations:
  1. Each string begins with Ja characters, since that will be the case in the real Japanese translation, and it's a situation worth testing.
  2. Each string contains enough English characters to allow the tester to "gist" the string from the context. This is helpful because pseudo-translation can often destroy the meaning of the string.
  3. Each string has a ratio of swell, with trailing characters adding 20% to the length of the string. This helps flush out fields and controls in which strings will be truncated.
Okapi Rainbow is an excellent (if somewhat inscrutable) text-manipulation utility for just this purpose. When run on all of the string files in the development project, the result is a set of resources which, when recompiled, will run as a pseudo-translated binary. With a testbench running the appropriate operating system, a tester can get a good idea of the I18n work in store for the developers.

Rare is the product that passes pseudo-translation testing on the first try, either because of strings left behind in the code, resizing issues, string truncation, buffer overflows, or just plain bad luck.

Even if your code isn't perfect, though, look on the bright side: You're now a pseudo-translator.

Labels: , , , , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,