24 July 2008

"I can quit smoking whenever I want to."

"...I just don't want to."

Have you heard that one before? I heard something similar last week from a director of engineering:

"All of our strings are embedded in source code. This is deliberate, and we planned it very carefully."

How would you have reacted?

At first, I figured he was pulling my leg ("taking the mickey," "having me on," etc.). Then he explained the process of localizing strings in the gnu gettext model, which can live peacefully without external resources.

A line of code reading

result = wx.MessageDialog(_("Welcome to my blog. Today is %s"), date.today)

uses the _ function in the English context as an identity function. In a localized context it will load the language pack built using the gnu gettext utilities and map the English strings to the localized equivalent:

"Welcome to my blog. Today is %s" -> "Bienvenido a mi blog. Hoy es %s"

To redeem what seems like shortsightedness in allowing developers to embed strings in code, these utilities also contain scripts that can pull out all the English strings from source code and make localization packages, which translators can work on without danger of touching the code. Other scripts can push the localized strings back into place.

Like .properties files in Java and .rc files in C++, these localization packages isolate non-code elements for easy localization. However, a programmer's coding mistake could still result in strings going undetected by the scripts, so I still plan to perform pseudo-translation and internationalization testing on this software as soon as possible.
Just in case the director of engineering can't quit smoking as easily as he thinks he can.

Labels: , , ,

29 November 2007

Keeping an eye on Catalyst

In localization, "Catalyst" is a tool from Alchemy Software. Among other things, it allows you to localize UI elements within software resource files, sometimes without the need to rebuild the software manually into binary format.

Since software binaries come from text files, part of Catalyst's value lies in straddling the divide between allowing the translator to change strings in the these text files (say, from English to Japanese) and displaying them in the binary, run-time format in which the user will see them on screen.

Last month a vendor returned some resource files to me which we had them localize from English to Japanese. I rebuilt the binaries (language-resource DLLs) and ran them. Unfortunately, a number of items were suddenly missing from the Japanese menus, so I had to troubleshoot the problem.

My first thought was that either a person or a tool (or a person using a tool) had modified something that should not be affected by the localization process. I had handed off a resource file containing these lines:

32777 MENU DISCARDABLE
BEGIN
POPUP "&Tools"
BEGIN
MENUITEM "Serial P&ort Settings...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "&Network Settings...", ID_TOOLS_NETWORK
MENUITEM "&Battery Settings...", ID_TOOLS_BATTERYSETTINGS
END
END

32779 MENU DISCARDABLE
BEGIN
POPUP "&File"
END


They returned to me a resource file containing these strings:

9 MENU DISCARDABLE
BEGIN
POPUP "ツール(&T)"
BEGIN
MENUITEM "シリアルポートの設定(&O)...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "ネットワーク設定(&N)...", ID_TOOLS_NETWORK
MENUITEM "バッテリの設定(&B)...", ID_TOOLS_BATTERYSETTINGS
END
END

11 MENU DISCARDABLE
BEGIN
POPUP "ファイル(&F)"
END

There was nothing wrong with the translation, and the string IDs were intact. The product has long been "double-byte clean," so I knew that the software was not gagging on the Japanese characters.

The problem lay in the menu ID numbers, which are 32777 and 32779 in the English, but which came back in the Japanese files as 9 and 11. The vendor believes that Catalyst changed them, since they had used it to for resizing and QA.

Normally, this renumbering has no effect on how the binary functions. In this case, however, it has a profound effect on how the binary functions, because there is code somewhere in the software that is looking for "32777" and "32779" and when it doesn't find those ID's, it cannot complete the menu. This is poor internationalization in the code base which I have discussed with Engineering, to no avail, so I need to police the resource files in each round of localization.

How is Catalyst working for you? Have you seen similar problems?

Interested in this topic? You might enjoy another article I've written called "Localized Binaries - The Plot Thickens"

Labels: , , , , , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,

07 December 2006

Localized Binaries - The Plot Thickens

The engineer has demonstrated that it is no longer possible to build just the resource binaries; it is now necessary to build the entire blinking product.

"Why is that?" I ask.

"We've improved the makefile," he replies. The makefile is a script used by the make command to build binaries.

"That doesn't feel like an improvement to me," I venture. "Why can't I just build the two or three resource binaries I need? I don't need all of the executables and other rot."

"Yes, well, we've improved the makefile."

"But there was a small, localized makefile that lived in each of the directories of the resource binaries I wanted. What happened to them?"

"We improved the main makefile by rolling all of those lower-level makefiles into it."

That's a hint to me that they improved it for the purpose of creating all of the files that go into the installer, but that's far and away more files than I want. It also means that it's probably going to take me a half-hour now to build binaries that used to take about six seconds each.

Had they been following good I18n hygiene, they'd have asked themselves (or me, even) whether there were costs associated with consolidating all of the lower-level makefiles and eliminating the possibility of rebuilding except in this huge batch. The costs don't really affect them that much, though they'll slow me down somewhat.

It's an "improvement."

Labels: , ,

27 November 2006

Pulling the rug out from under the Localization Manager

It's a thrill for the gearhead in me to build my own localized binaries.

Most projects in the wide world don't require this of the localization manager, of course. A staff engineer, or at least a release engineer, is usually tasked with building the binaries that house the localized software resources. There's some delay involved in that, though, since the engineers don't often place very high priority on building these infernal things, let alone building them as often as the localization QA cycles require.

The localizers are able to preview the localized resources in their localization environment (Alchemy, Visual Studio, etc.), but our engineers have an arcane build environment and procedure that I don't care to impose on even my least liked localization vendor, simply because it's an open invitation to failure. Instead, I persuaded the engineer who created the entire scheme to spend three hours duplicating the environment with me so that I could document it and reproduce it on a quick turnaround.

"Why are you going to all this trouble?" the engineer asked me.

"I'm trying to drive you crazy this one time so that I don't drive you crazy eight or nine times over the next few weeks. The translators will find new things to change as they continue localizing the rest of the product, and they'll change the resource files. If I have to bug you with each one of these changes, you may come to view localization as something, well, inconvenient."

"Good. Thanks for sparing me that."

That was for version 2.0.0 of this software. I was able to save precious days by doing my own builds and turning the binaries around to the localizers promptly. I also saved myself all of the credibility and Brownie points I'd have had to mortgage by running to the engineers all the time.

Now that we're localizing version 2.0.1, however, the procedure is changed. The engineers have pulled the rug out from under me, and nothing that used to work, works. Time to bug the engineer again and get the updated Rosetta Stone so that I can build these things.

Labels: , ,

20 August 2006

Bad internationalization practice

Unfortunately, there's been another architecture change besides the move to .NET: Engineering has split the resource DLL into two pieces.

This is not bad news in itself, but there is a tricky dimension to putting the the two DLLs together at run time, and the engineers have handled it in a way that assumes a little too much.

The main menu contains the usual entries (File, Edit, View, Tools, Windows, Help), each of which contains a submenu. The localization hiccup is that some of the submenu items live in one DLL, and the others live in the other DLL. What brings them together at run-time? The software depends on the presence of the string "&Edit" in each one. What happens when "&Edit" gets translated? "Oh, well, I guess we didn't think of that..."

The pseudo-translated string reads "&ßéüdßéüt". The sets of submenu items don't find one another in the DLLs at run-time, so they simply don't show up in the menus. Another triumph for the farsightedness of internationalization testing, and back to the drawing board for the developers.

Labels: , ,

16 August 2006

Pseudo-translating the resource files

I probably shouldn't enjoy this stuff so much, but I'm a gearhead at heart, so I get a lot of gratification from climbing around inside resource files.

One of the unsung virtues of localization consulting is pseudo-translation and subsequent QA. The goal is to replace the source (in this case, English) strings with well thought-out gibberish, in an effort to make the software barf. This can take a number of forms, such as:
  • truncated strings
  • corrupted characters
  • hard-coded strings
  • expanses of blank space where strings should be; and
  • crashes (my favorite)
I'm not really all that happy that I've caused the software to crash, but at least it vindicates the function of localization project management in general and pseudo-translation in particular in a way that even the most jaded developer cannot ignore.

Labels: ,