21 August 2008

Localizing Code Snippets - Part II

Last week I posted on the dilemma of how to localize Code Snippets, the selected pieces of your documentation that you shoehorn into XML files so that Visual Studio can present them in tool-tip-like fashion to the user while s/he is writing code that depends on your documentation.

My goal was to ensure that the process of grabbing these bits of documentation (mostly one-sentence descriptions and usage tips) was internationalized, so that we could run it on translated documentation and save money. This has proved more difficult than anticipated.

Here is the lesson: If you think it's hard to get internal support for internationalizing your company's revenue-generating products, just try to get support for internationalizing the myriad hacks, scripts, macros and shortcuts your developers use to create those products.

In this client's case, it makes more sense to translate the documentation, then re-use that translation memory on all of the Code Snippet files derived from the documentation. It will cost more money (mostly for translation engineering and QA, rather than for new translation) in the short run, but less headache and delay in the long run. Not to mention fewer battles I need to fight.

Discretion is the better part of localization valor.

Labels: , , , , ,

14 August 2008

Localizing Code Snippets

"Why would I localize code snippets?" you ask. (Go ahead; ask.)

Everybody knows you don't translate snippets of code. Even if you found a translator brave enough to take on something like int IBACKLIGHT_GetBacklightInfo(IBacklight *p, AEEBacklightInfo * pBacklightInfo), the compiler would just laugh and spit out error messages.

However, if you're a developer (say, of Windows applications) working in an integrated development environment (say, Microsoft Visual Studio), you may want to refer very quickly to the correct syntax and description of a feature without searching for it in the reference manual. The Code Snippet enhancement to Visual Studio makes this possible with a small popup box that contains thumbnail documentation on the particular interface the developer wants to use. It's similar in concept and appearance to the "What's This?" contextual help offered by right-clicking on options in many Windows applications.

How does the thumbnail documentation get in there? It's a tortuous path, but the enhancement pulls text from XML-formatted .snippet files. You can fill the .snippet files with the information yourself, or you can populate them from your main documentation source using Perl scripts and XSL transformation. So while you're not really translating code snippets, you're translating Code Snippets.

And therein lies the problem.


One of our clients is implementing Code Snippets, but the Perl scripts and XSL transformation scripts they're using to extract the documentation, don't support Unicode. I found this out because I pseudo-translated some of the source documentation and ran the scripts on them. Much of the text didn't survive to the .snippet files, so we're on a quest to find the offending portions of the scripts and suggest internationalization changes.

We've determined that the translated documentation in the Code Snippets will display properly in Visual Studio; the perilous part of the journey is the process of extracting the desired subset of documentation and pouring it into the .snippet files. Don't expect that your developers will automatically enable the code for this; you'll probably have to politely persist to have it done right.

Alternatives:
  • Wait until all of your documentation has been translated, then translate the .snippet files. It's more time-consuming and it will cost you more, but working this far downstream may be easier than getting your developers to clean up their scripts.
  • Make your Japanese developers tolerate English documentation in the Code Snippets.
Neither one is really the Jedi way. Work with your developers on this.

Labels: , , , , ,

19 June 2008

Giant Localization Leap Backwards

"All of the strings are embedded in the code."

There was a time when I welcomed - or at least was not very much surprised by - sentences like this one. They came from engineers in response to my questions about the readiness of their software strings to be localized. Strings embedded in code, of course, are more or less inaccessible to localization techniques, since nobody wants to hand off an entire code base to a translator, and no translator wants to wade through an entire code base trying to find strings to translate.

So, when one of my client's engineers said it to me yesterday in reference to an application in a larger product we plan to localize, I briefly welcomed it. It means more work.

But then I realized that combing all of the strings out of the code and into separate, accessible files will require a great deal of time and effort (not mine). Engineers don't usually enjoy working on this kind of task, so it will fall to the bottom of the priority stack, and the product manager won't go to bat for it, and so this particular application will stick out like a sore thumb as a non-localized component in an otherwise localized product suite.

"Is there a phased approach we could take to enabling this app for localization?" the engineer asked.

I appreciated his attempt to save the game, but a partially localized product is rather ugly. We could enable and translate the menu and dialog strings for this release, and go back for the error messages in the next release, but the mongrel product is not very appealing to users in the meantime.

This is disappointing, because we've made such long localization-strides elsewhere in the product suite, and dealing with this newly acquired app feels like such a giant leap backwards. I guess I'll work up some estimates on the time required to enable the application, then make my case to the product manager and development lead to generate some interest and start the process from the beginning.

Isn't that why we localization project managers and international product managers were sent here?

What do you do in your company when engineers tell you that all the strings are embedded in the code?

Labels: , , , , , , ,

29 November 2007

Keeping an eye on Catalyst

In localization, "Catalyst" is a tool from Alchemy Software. Among other things, it allows you to localize UI elements within software resource files, sometimes without the need to rebuild the software manually into binary format.

Since software binaries come from text files, part of Catalyst's value lies in straddling the divide between allowing the translator to change strings in the these text files (say, from English to Japanese) and displaying them in the binary, run-time format in which the user will see them on screen.

Last month a vendor returned some resource files to me which we had them localize from English to Japanese. I rebuilt the binaries (language-resource DLLs) and ran them. Unfortunately, a number of items were suddenly missing from the Japanese menus, so I had to troubleshoot the problem.

My first thought was that either a person or a tool (or a person using a tool) had modified something that should not be affected by the localization process. I had handed off a resource file containing these lines:

32777 MENU DISCARDABLE
BEGIN
POPUP "&Tools"
BEGIN
MENUITEM "Serial P&ort Settings...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "&Network Settings...", ID_TOOLS_NETWORK
MENUITEM "&Battery Settings...", ID_TOOLS_BATTERYSETTINGS
END
END

32779 MENU DISCARDABLE
BEGIN
POPUP "&File"
END


They returned to me a resource file containing these strings:

9 MENU DISCARDABLE
BEGIN
POPUP "ツール(&T)"
BEGIN
MENUITEM "シリアルポートの設定(&O)...", ID_TOOLS_SERIALPORTSETTINGS
MENUITEM "ネットワーク設定(&N)...", ID_TOOLS_NETWORK
MENUITEM "バッテリの設定(&B)...", ID_TOOLS_BATTERYSETTINGS
END
END

11 MENU DISCARDABLE
BEGIN
POPUP "ファイル(&F)"
END

There was nothing wrong with the translation, and the string IDs were intact. The product has long been "double-byte clean," so I knew that the software was not gagging on the Japanese characters.

The problem lay in the menu ID numbers, which are 32777 and 32779 in the English, but which came back in the Japanese files as 9 and 11. The vendor believes that Catalyst changed them, since they had used it to for resizing and QA.

Normally, this renumbering has no effect on how the binary functions. In this case, however, it has a profound effect on how the binary functions, because there is code somewhere in the software that is looking for "32777" and "32779" and when it doesn't find those ID's, it cannot complete the menu. This is poor internationalization in the code base which I have discussed with Engineering, to no avail, so I need to police the resource files in each round of localization.

How is Catalyst working for you? Have you seen similar problems?

Interested in this topic? You might enjoy another article I've written called "Localized Binaries - The Plot Thickens"

Labels: , , , , , ,

19 October 2007

Whaddya know? They asked me first this time!

Do you spend a lot of your time running to catch up to the train? Have you ever been surprised in the middle of a meeting by project plans that were well underway with no thought given yet to localization? Are you getting used to it?

What if they asked you first (or at least early on) about the project's implications for internationalization and localization? Would you know how to react?

This certainly caught me by surprise a few months ago. A client called me in for consultation. He didn't want me to manage the upcoming localization of his user manuals; he wanted me to review and edit the English versions so that they would be ready to localize.

This client, though small, is enlightened. The company is selling English, French, German, Spanish and Japanese versions of several products, and it has a hand-in-glove relationship with its localization company. It knows where its global bread is buttered.

I jumped at the chance to work with people thinking this far in advance, so I reviewed the manuals and submitted changes, almost all of which were acceptable.

How can you review/edit documentation with an eye to translating it?
  1. Take advantage of redundancy. Ensuring that identical sentences and paragraphs remain identical is a good way to lower per-word translation costs. Turn the text into a bookmark at its first occurrence, then invoke or cross-reference that bookmark at subsequent occurrences.
  2. Ensure that the product matches the documentation. Not all organizations get around to this, believe it or not, and it becomes a bit of value added by the internationalization/localization function.
  3. Standardize terms. Especially in companies without a well developed team of writers, manuals end up with pairs or trios of synonyms that will vex translators and add no information, so take the liberty of eliminating one in favor of the other:
    • Determine/specify
    • based on/according to
    • click the button/click on the button/select the button
    • lets you/enables you to/allows you to
  4. Mention errors and inconsistencies that have nothing to do with internationalization. Again, you increase the perceived value of the localization function. Even though the result doesn't affect the localized products, the Localization Department (you) are contributing to a better core product.
  5. Axe a few "dead" words. They add little to the explanation, will probably not survive translation, and inflate wordcount:
    • unique
    • basically
    • popular
    • congratulations
    • very much
By the way, the review took longer than I'd anticipated, so if you have a similar opportunity, don't bid a flat fee the first time.

Interested in this topic? Have a look at Improved Docs through Localization.

Labels: , , , ,

13 March 2007

Localization versus Internationalization

Do you know the difference between localizing and internationalizing? Most people can go their entire lives without knowing it, but there is a useful distinction.

You localize your product when you customize it to the needs of a particular market, usually on geographic and linguistic bounds. So if you're Toyota, and you want to sell cars in California, you need to equip them with the more robust emission control systems required by law there. Or, if you want to sell them in UK, you need to build them with the steering wheel on the right-hand side.

What you soon discover, though, is that it's prohibitively expensive to create and maintain separate production lines for California, UK and other foreign markets, not to mention your domestic market. The challenge then becomes to properly internationalize your product so that the changes needed for each market cause as little disruption as possible to your production process. You design your cars so that there is some irreducible core that is the same, no matter where the car will be sold.

This is easier said than done, but in a technology company, where production lines move extremely fast - because it's all just software - it's important to bite the internationalization bullet early and often. The alternatives are multiple, straggler code lines; separate Web sites; and unintegrated, unwieldy sets of documentation.

So, when all about you are losing their heads over localization (L10n), you can be the calm voice of reason, asking whether the product is really ready for localization yet, or whether your colleagues should pause and think internationalization (I18n) first.

(Note: The term g l o b a l i z a t i o n is also used to describe the overall process of creating products for worldwide markets. The problem, in our search-engine-optimized day and age, is that the same term applies to the assimilation of national character and identity into the worldwide market, with undesirable ramifications to the disenfranchised. It's not wrong to talk about g l o b a l i z i n g your product, but if you go searching for information on localization, don't use the g-word.)

Labels: , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,

07 October 2006

Localization and the Perl Script

After some cajoling, I've prevailed on our tech-writer-who-doesn't-do-any-writing to modify his Perl scripts. The changes will remove the thousands of CRLF (hard returns) in the 3700 extracted HTML files, and result in better Trados matching between the new files and translation memory.

Then, of course, it will take a few hours' perusal to see what breaks as a result of that fix.

It seems to be an unsung inconvenience of localization that
a
sentence
put
together
with
these
words
and
looking
like
this

separated by hard returns in the raw HTML file (which you can see by viewing source in a browser) becomes

a sentence put together with these words and looking like this

when viewed in a browser. The translation memory tools, of course, see the hard returns and try in vain to match accordingly, but they can result in a fair bit of head-scratching to those viewing the files only through a browser.

Labels: , , ,

25 August 2006

Internationalization and the smart installer

Have we been thankful enough for InstallShield? I think it's a royal headache for the release engineers that have to get used to it, but it's a dream for a localization project manager:
  • InstallShield does most of the hard work. Most of the strings are already translated into more languages than most companies know what to do with.
  • Customized strings live in a single, text-based value.shl file, which the release engineers peel off and hand me for translation.
  • By default it creates language-specific branches in source control, which prevents, say, your Russian release from getting pasted in as a mere revision to your original English release.
The value.shl file is very simple, and ours changes so infrequently that it's easiest for me to update it myself (version numbers, copyright dates, URLs), without need to hand it off for translation.

Of course, it did drive the release engineers batty in the early days, especially when I wandered in asking for 3 Asian and 2 Western installers every few months. The hard part for them is seeing far enough down the road to build a maintainable structure in source control. It never occurred to them to start out with branches labeled /en/ or /0009-English/ because they never foresaw the need for other languages, so they painted themselves into corners but didn't realize it until Chinese came along one day.

People in this industry write about introducing worldwide consciousness to the overall mindset of the organization, and evangelizing the gospel of localization; that's the 50,000 foot-/16,129 meter-level. Must be nice. I spend most of my time crawling in a trench in source control somewhere, trying to soften periods into decimal separators without getting flamed.

Labels: , , ,

23 August 2006

Fixing that small internationalization gaffe

The engineers resolved the internationalization problem. Sort of.

They've modified the logic so that it no longer depends on the hardcoded presence of "&Tools" to pull the resources in correctly from two separate DLLs. However, it still looks for the literal "&Edit" in each DLL. If it doesn't find it, the submenu items do not show up. I know, because I broke it again with a random pseudo-translation pass that rendered "&Edit" as "&ßéüdßéüt" in one resource file and "&ßéüñdßéüñt" in the other.

"Well, what do you expect?" asked the developer, when I explained this to him. "Get your pseudo-act together and you won't find problems like this."

I granted him that it was very unlikely that "&Edit" would be translated differently in two places - well, it could happen, but it should not happen - but that was not the point. It's just not good programming practice to depend on string literals like that, whether localization engineering is a concern or not. "Why don't you make the dependency on the string ID instead? Localization will never go near that."

"Submit a ticket on it and we'll see for next time," he replied. "I've got other dragons to slay right now."

So, I filed the request and the enhancement is in the great cosmic wash of the engineering team's Issue Review system.

Labels: , ,

20 August 2006

Bad internationalization practice

Unfortunately, there's been another architecture change besides the move to .NET: Engineering has split the resource DLL into two pieces.

This is not bad news in itself, but there is a tricky dimension to putting the the two DLLs together at run time, and the engineers have handled it in a way that assumes a little too much.

The main menu contains the usual entries (File, Edit, View, Tools, Windows, Help), each of which contains a submenu. The localization hiccup is that some of the submenu items live in one DLL, and the others live in the other DLL. What brings them together at run-time? The software depends on the presence of the string "&Edit" in each one. What happens when "&Edit" gets translated? "Oh, well, I guess we didn't think of that..."

The pseudo-translated string reads "&ßéüdßéüt". The sets of submenu items don't find one another in the DLLs at run-time, so they simply don't show up in the menus. Another triumph for the farsightedness of internationalization testing, and back to the drawing board for the developers.

Labels: , ,