21 August 2008

Localizing Code Snippets - Part II

Last week I posted on the dilemma of how to localize Code Snippets, the selected pieces of your documentation that you shoehorn into XML files so that Visual Studio can present them in tool-tip-like fashion to the user while s/he is writing code that depends on your documentation.

My goal was to ensure that the process of grabbing these bits of documentation (mostly one-sentence descriptions and usage tips) was internationalized, so that we could run it on translated documentation and save money. This has proved more difficult than anticipated.

Here is the lesson: If you think it's hard to get internal support for internationalizing your company's revenue-generating products, just try to get support for internationalizing the myriad hacks, scripts, macros and shortcuts your developers use to create those products.

In this client's case, it makes more sense to translate the documentation, then re-use that translation memory on all of the Code Snippet files derived from the documentation. It will cost more money (mostly for translation engineering and QA, rather than for new translation) in the short run, but less headache and delay in the long run. Not to mention fewer battles I need to fight.

Discretion is the better part of localization valor.

Labels: , , , , ,

14 August 2008

Localizing Code Snippets

"Why would I localize code snippets?" you ask. (Go ahead; ask.)

Everybody knows you don't translate snippets of code. Even if you found a translator brave enough to take on something like int IBACKLIGHT_GetBacklightInfo(IBacklight *p, AEEBacklightInfo * pBacklightInfo), the compiler would just laugh and spit out error messages.

However, if you're a developer (say, of Windows applications) working in an integrated development environment (say, Microsoft Visual Studio), you may want to refer very quickly to the correct syntax and description of a feature without searching for it in the reference manual. The Code Snippet enhancement to Visual Studio makes this possible with a small popup box that contains thumbnail documentation on the particular interface the developer wants to use. It's similar in concept and appearance to the "What's This?" contextual help offered by right-clicking on options in many Windows applications.

How does the thumbnail documentation get in there? It's a tortuous path, but the enhancement pulls text from XML-formatted .snippet files. You can fill the .snippet files with the information yourself, or you can populate them from your main documentation source using Perl scripts and XSL transformation. So while you're not really translating code snippets, you're translating Code Snippets.

And therein lies the problem.


One of our clients is implementing Code Snippets, but the Perl scripts and XSL transformation scripts they're using to extract the documentation, don't support Unicode. I found this out because I pseudo-translated some of the source documentation and ran the scripts on them. Much of the text didn't survive to the .snippet files, so we're on a quest to find the offending portions of the scripts and suggest internationalization changes.

We've determined that the translated documentation in the Code Snippets will display properly in Visual Studio; the perilous part of the journey is the process of extracting the desired subset of documentation and pouring it into the .snippet files. Don't expect that your developers will automatically enable the code for this; you'll probably have to politely persist to have it done right.

Alternatives:
  • Wait until all of your documentation has been translated, then translate the .snippet files. It's more time-consuming and it will cost you more, but working this far downstream may be easier than getting your developers to clean up their scripts.
  • Make your Japanese developers tolerate English documentation in the Code Snippets.
Neither one is really the Jedi way. Work with your developers on this.

Labels: , , , , ,

19 June 2008

Giant Localization Leap Backwards

"All of the strings are embedded in the code."

There was a time when I welcomed - or at least was not very much surprised by - sentences like this one. They came from engineers in response to my questions about the readiness of their software strings to be localized. Strings embedded in code, of course, are more or less inaccessible to localization techniques, since nobody wants to hand off an entire code base to a translator, and no translator wants to wade through an entire code base trying to find strings to translate.

So, when one of my client's engineers said it to me yesterday in reference to an application in a larger product we plan to localize, I briefly welcomed it. It means more work.

But then I realized that combing all of the strings out of the code and into separate, accessible files will require a great deal of time and effort (not mine). Engineers don't usually enjoy working on this kind of task, so it will fall to the bottom of the priority stack, and the product manager won't go to bat for it, and so this particular application will stick out like a sore thumb as a non-localized component in an otherwise localized product suite.

"Is there a phased approach we could take to enabling this app for localization?" the engineer asked.

I appreciated his attempt to save the game, but a partially localized product is rather ugly. We could enable and translate the menu and dialog strings for this release, and go back for the error messages in the next release, but the mongrel product is not very appealing to users in the meantime.

This is disappointing, because we've made such long localization-strides elsewhere in the product suite, and dealing with this newly acquired app feels like such a giant leap backwards. I guess I'll work up some estimates on the time required to enable the application, then make my case to the product manager and development lead to generate some interest and start the process from the beginning.

Isn't that why we localization project managers and international product managers were sent here?

What do you do in your company when engineers tell you that all the strings are embedded in the code?

Labels: , , , , , , ,

16 March 2007

How to pseudo-translate, Part II

You only speak one language, so maybe you'll never be a translator, but you have a chance as a pseudo-translator.

Pseudo-translation is the process of replacing or adding characters to your software strings to try and break the software, or at least uncover strings that are still embedded in the code and need to be externalized for proper localization. (Part I of this post describes why anybody would want to do such a thing.) Pseudo-translation is a big piece of internationalization (I18n), which you should undertake before you bother handing anything off to the translators.

Here's an example of a few strings from a C resource file, with their respective, pseudo-translations:

IDS_TITLE_OPEN_SKIN "Select Device"
IDS_TITLE_OPEN_SKIN "日本Sイlイct Dイvウcイ本日"

IDS_MY_FOLDER "Directory:"
IDS_MY_FOLDER "日本Dウrイctエrユ:本本"

IDS_MY_OPEN "&Open"
IDS_MY_OPEN "日本&Opイn日"

IDS_WINDOW_NOT_ENOUGH_MEM
"Windows has not enough memory. You may lower the heap size specified in the configuration file."
IDS_WINDOW_NOT_ENOUGH_MEM
"日本Wウndエws hアs nエt イnエオgh mイmエrユ. Yエオ mアユ lエwイr thイ hイアp sウzイ spイcウfウイd ウn thイ cエnfウgオrアtウエn fウlイ.本日本日日本本本日日本日日本日本日本日本"

IDS_TARGET_INITIALIZATION_FAILED
"Failed to load or initialize the target."
IDS_TARGET_INITIALIZATION_FAILED
"日本Fアウlイd tエ lエアd エr ウnウtウアlウzイ thイ tアrgイt.日日本日本日本本"

In these strings, Japanese characters have been pushed in to replace the vowels in all English words. The goal of using Ja characters is to ensure that, when compiled, the strings will look and behave as they should under Windows Japanese; it's important to pseudo-translate with the right result in mind.

Some observations:
  1. Each string begins with Ja characters, since that will be the case in the real Japanese translation, and it's a situation worth testing.
  2. Each string contains enough English characters to allow the tester to "gist" the string from the context. This is helpful because pseudo-translation can often destroy the meaning of the string.
  3. Each string has a ratio of swell, with trailing characters adding 20% to the length of the string. This helps flush out fields and controls in which strings will be truncated.
Okapi Rainbow is an excellent (if somewhat inscrutable) text-manipulation utility for just this purpose. When run on all of the string files in the development project, the result is a set of resources which, when recompiled, will run as a pseudo-translated binary. With a testbench running the appropriate operating system, a tester can get a good idea of the I18n work in store for the developers.

Rare is the product that passes pseudo-translation testing on the first try, either because of strings left behind in the code, resizing issues, string truncation, buffer overflows, or just plain bad luck.

Even if your code isn't perfect, though, look on the bright side: You're now a pseudo-translator.

Labels: , , , , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,

25 August 2006

Internationalization and the smart installer

Have we been thankful enough for InstallShield? I think it's a royal headache for the release engineers that have to get used to it, but it's a dream for a localization project manager:
  • InstallShield does most of the hard work. Most of the strings are already translated into more languages than most companies know what to do with.
  • Customized strings live in a single, text-based value.shl file, which the release engineers peel off and hand me for translation.
  • By default it creates language-specific branches in source control, which prevents, say, your Russian release from getting pasted in as a mere revision to your original English release.
The value.shl file is very simple, and ours changes so infrequently that it's easiest for me to update it myself (version numbers, copyright dates, URLs), without need to hand it off for translation.

Of course, it did drive the release engineers batty in the early days, especially when I wandered in asking for 3 Asian and 2 Western installers every few months. The hard part for them is seeing far enough down the road to build a maintainable structure in source control. It never occurred to them to start out with branches labeled /en/ or /0009-English/ because they never foresaw the need for other languages, so they painted themselves into corners but didn't realize it until Chinese came along one day.

People in this industry write about introducing worldwide consciousness to the overall mindset of the organization, and evangelizing the gospel of localization; that's the 50,000 foot-/16,129 meter-level. Must be nice. I spend most of my time crawling in a trench in source control somewhere, trying to soften periods into decimal separators without getting flamed.

Labels: , , ,

23 August 2006

Fixing that small internationalization gaffe

The engineers resolved the internationalization problem. Sort of.

They've modified the logic so that it no longer depends on the hardcoded presence of "&Tools" to pull the resources in correctly from two separate DLLs. However, it still looks for the literal "&Edit" in each DLL. If it doesn't find it, the submenu items do not show up. I know, because I broke it again with a random pseudo-translation pass that rendered "&Edit" as "&ßéüdßéüt" in one resource file and "&ßéüñdßéüñt" in the other.

"Well, what do you expect?" asked the developer, when I explained this to him. "Get your pseudo-act together and you won't find problems like this."

I granted him that it was very unlikely that "&Edit" would be translated differently in two places - well, it could happen, but it should not happen - but that was not the point. It's just not good programming practice to depend on string literals like that, whether localization engineering is a concern or not. "Why don't you make the dependency on the string ID instead? Localization will never go near that."

"Submit a ticket on it and we'll see for next time," he replied. "I've got other dragons to slay right now."

So, I filed the request and the enhancement is in the great cosmic wash of the engineering team's Issue Review system.

Labels: , ,