21 August 2008

Localizing Code Snippets - Part II

Last week I posted on the dilemma of how to localize Code Snippets, the selected pieces of your documentation that you shoehorn into XML files so that Visual Studio can present them in tool-tip-like fashion to the user while s/he is writing code that depends on your documentation.

My goal was to ensure that the process of grabbing these bits of documentation (mostly one-sentence descriptions and usage tips) was internationalized, so that we could run it on translated documentation and save money. This has proved more difficult than anticipated.

Here is the lesson: If you think it's hard to get internal support for internationalizing your company's revenue-generating products, just try to get support for internationalizing the myriad hacks, scripts, macros and shortcuts your developers use to create those products.

In this client's case, it makes more sense to translate the documentation, then re-use that translation memory on all of the Code Snippet files derived from the documentation. It will cost more money (mostly for translation engineering and QA, rather than for new translation) in the short run, but less headache and delay in the long run. Not to mention fewer battles I need to fight.

Discretion is the better part of localization valor.

Labels: , , , , ,

14 August 2008

Localizing Code Snippets

"Why would I localize code snippets?" you ask. (Go ahead; ask.)

Everybody knows you don't translate snippets of code. Even if you found a translator brave enough to take on something like int IBACKLIGHT_GetBacklightInfo(IBacklight *p, AEEBacklightInfo * pBacklightInfo), the compiler would just laugh and spit out error messages.

However, if you're a developer (say, of Windows applications) working in an integrated development environment (say, Microsoft Visual Studio), you may want to refer very quickly to the correct syntax and description of a feature without searching for it in the reference manual. The Code Snippet enhancement to Visual Studio makes this possible with a small popup box that contains thumbnail documentation on the particular interface the developer wants to use. It's similar in concept and appearance to the "What's This?" contextual help offered by right-clicking on options in many Windows applications.

How does the thumbnail documentation get in there? It's a tortuous path, but the enhancement pulls text from XML-formatted .snippet files. You can fill the .snippet files with the information yourself, or you can populate them from your main documentation source using Perl scripts and XSL transformation. So while you're not really translating code snippets, you're translating Code Snippets.

And therein lies the problem.


One of our clients is implementing Code Snippets, but the Perl scripts and XSL transformation scripts they're using to extract the documentation, don't support Unicode. I found this out because I pseudo-translated some of the source documentation and ran the scripts on them. Much of the text didn't survive to the .snippet files, so we're on a quest to find the offending portions of the scripts and suggest internationalization changes.

We've determined that the translated documentation in the Code Snippets will display properly in Visual Studio; the perilous part of the journey is the process of extracting the desired subset of documentation and pouring it into the .snippet files. Don't expect that your developers will automatically enable the code for this; you'll probably have to politely persist to have it done right.

Alternatives:
  • Wait until all of your documentation has been translated, then translate the .snippet files. It's more time-consuming and it will cost you more, but working this far downstream may be easier than getting your developers to clean up their scripts.
  • Make your Japanese developers tolerate English documentation in the Code Snippets.
Neither one is really the Jedi way. Work with your developers on this.

Labels: , , , , ,

29 May 2008

Localizing Robohelp Files - The Basics

We get a lot of search engine queries like "localize Robohelp file" and "translate help project." I'm pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project - .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don't have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you're the in-country partner of a company whose product needs localizing, and you've convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You're partially right: it's not impossible. In fact, it's even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you've finished translating, you'll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.
  • There may be a way to translate inside the compiled help file, but I wouldn't trust it. Fundamentally, it's necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don't have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It's part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you're still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.
We've marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what's involved.

If you liked this article have a look at "Localizing Robohelp Projects."

Labels: , , , , , , , ,

27 July 2007

Virtual machines as localization testbenches

What are you doing for localized testbenches? Are you still partitioning hard drives or, worse yet, dedicating entire machines to one language-platform? Lab getting a bit crowded and hot, is it? Consider using virtual machines, or VMs.

Microsoft's Virtual PC 2007 is free and uncrippled, so you can create, run and administer your own VMs. It's not bad software, and you can be sure that if there are any "special" tricks a VM should know about the Windows-version hosting it, this product will know them. VMWare's product is not free, but features a VM player, so if somebody in your department has the full product and can create VMs for you, you can use them as you would a normal drive.

A VM is just a huge file (around 1GB for Windows 2000, 3GB for Windows XP, 6GB for Vista) that you mount and run as a "guest" session in its own window. It's like having a computer inside a computer, although it takes away drive space, RAM and processor cycles that the "host" system - the one your computer runs normally - used to use. You can start and stop the VMs as you need them, and you can install almost any OS or language to run as a guest inside the VM; however, you do need to procure a legal copy of that OS/language combination.

Tiring of so much ancient kit lying around the lab, we've begun to migrate to VMs. They're not a panacea, but they make things like remote testing a good deal easier, they require less hardware, and they make dual-boot configurations irrelevant. There's quite a performance hit, unfortunately, and we're finding that VMWare VMs are a bit more responsive than Microsoft's.
However, most localization testing is focused on UI and functionality rather than on performance, so this may not affect your lab unduly.

Creating one is not that difficult. Here's an example for Microsoft's Virtual PC 2007:
  1. Obtain a machine with about 2GB of RAM and at least 80GB of drive space, running, say Windows XP English.
  2. Download and install MS Virtual PC 2007.
  3. Create a virtual machine, specifying Windows XP. For installation, give the VM 1GB of RAM; you can reduce that amount later if need be.
  4. Obtain the installation disk for the desired OS (e.g., Windows XP Japanese) and place it in the drive.
  5. Start the VM. As it opens in its new window, specify that you want to capture the CD drive (or the image file on the host machine, if you're mounting a .img).
  6. Installation will then take place as normal, with disk checking, file copying, and all the configuration and rebooting you would expect of an installation on a physical drive.
  7. A few GB later, you have a WinXP Japanese VM running as a guest on your Win XP English host. Install the Virtual Disk Additions to enable features like host-to-guest drag-and-drop.
Of note:
  1. Your VMs don't inherit domain information from the host machine, so if you want the VMs on the domain for things like advertised programs and SMS pushes, you'll need to arrange that separately with the network administrator.
  2. They just get bigger. There is a feature to "compact" the VMs, but the resulting file takes no less space on disk.
  3. Migrating from existing physical drives into a VM is a crapshoot. Our most wildly successful experimentation has resulted in Blue Screen of Death, so don't expect to take an existing testbench and copy it into a new VM, any more than you would expect it to work going from a desktop to a laptop physical machine.
  4. There's an emulation layer between the VM and the hardware, so peripherals (USB devices, dongles) may not run the same way as they do on a physical drive.
  5. VMs are portable, so if you can get one to run on your desktop, you should be able to copy it to other computers and use it on them. For that matter, VMWare VMs can be hosted on servers and run remotely.

Labels: , , , ,

01 June 2007

Market Requirements for Localization

What good is all the market research if your product doesn't support the locale, and if Engineering can't get it to support the locale?

As product manager, you're pleased with your product's global reach. You've successfully localized for the low-hanging fruit (other Latin-based character sets like Spanish, German, even Nordic), and your product and Web site makes customers happy all over the Western world. You have established robust processes for:
  • researching the needs of each foreign market
  • making those needs an integral part of the product requirements
  • working with Engineering on timetables for support of the needs
  • working with QA to ensure the engineering work can be adequately tested
  • releasing in foreign markets and enjoying success in them
Now talk turns to Asian markets, and multibyte enabling of your product and Web presence. You meet with Engineering and, as they've done for the European languages, they assure you that their code is, or will be, clean, and that you'll replicate in Asia the success you've had in Europe. Everybody nods, and it's just like Euro-success again.

But what if it isn't?

As product manager, you want to do your usual, excellent job of identifying market requirements and writing up the intelligence so that Engineering knows what the product needs to support. You'd better scratch a little harder, though.
  1. How is Engineering going to validate the product for multibyte? Peer review of code? Bring in an internationalization engineer? Pseudo-translation? You can't just take their word for it; you have too much at stake.
  2. Can your Web team create a staging environment and test cases close enough to what the production environment will be like?
  3. Has QA done a good job in flushing out bugs in your other localized products? Look back at the bug reports from German or Finnish; did they really find many problems? Did they all get fixed? Do you know for sure that they're testing under production-caliber conditions and on production-caliber testbenches?
  4. Do you really need to launch in Japanese, Korean and two versions of Chinese at the same time? Can you adopt a phased approach? Which market can give you the best support as you're enabling your product? (Hint: It's often Japan.)
This is the localization-equivalent of getting your ducks in a row. After you've done all the work of finding out what the market requires, you'd better be sure that the product you want to sell them really will perform as you claim.

Engineering, this is not business as usual. This is Asia.

Labels: , , , , , ,

11 May 2007

Localizing RoboHelp projects

Is it time for you to localize you RoboHelp projects? What's involved?

"RoboHelp project" is shorthand for "compiled help system." When this lives on a Windows client computer it is usually HTML Help (CHM) files. There are other variations like Web Help, which are also compiled HTML, but which do not run on the client.

The projects are a set of HTML files, authored in a tool such as--but not limited to--RoboHelp, then compiled into a binary form that allows for indexing, hierarchy and table of contents. Other platforms (Mac OS, Linux, Java) require a different compiler, but the theory is the same.

If you've done localization before, you'll find that RoboHelp projects are relatively easy, compared to a software project. RoboHelp (or whatever your authoring/compilation environment may be) creates a directory structure and file set that is easy to archive and hand off. It includes a main project file, table of contents file and index file. In fact, it's even possible in a pinch to simply hand off the compiled file, and have the localizers decompile it; the files they need will fall into place as a result of the decompilation.

Although you may think of the project as a single entity for localization purposes, each HTML page is a separate component. There may be large numbers of these pages that don't change from one version of your product to the next; nevertheless, you need to hand them off with the project, and you'll likely be charged for a certain amount of "touching" that the localizer's engineers will need to do. You may be able to save them some work and yourself some money by analyzing the project and determining which pages have no translatable changes, but by and large you should consider the costs for touching unchanged pages an unavoidable expense.

The biggest problem with these projects is in-country review. There's no easy way for an in-country reviewer to make changes or post comments in the compiled localized version. We've found that MS Excel is the worst way of doing this (except for all the others), so we've learned to live with it.

In theory, the translators are not mucking about with any tags, so the compiled localized version should work the same as the original. Yeah, right. All the links need to be checked--they do break sometimes--and the index and table of contents should be validated. And, don't forget to try a few searches to make sure they work; your customers surely will, and you want to spare them any unpleasant surprises.

Remember:
  • If you've included graphics in your help project, you'll need to obtain the original source files. These are not GIFs or JPEGs; they will be the application files from which the GIFs and JPEGs were generated. You'll need to hand off files from applications like Adobe Illustrator, or Flash or even PowerPoint, so that the translators can properly edit the text in them. Engineers often do quick mock-ups in Microsoft Word's Word Art that end up in the final product, and it takes a while to track them down.
  • Encoding can be thorny. Some compilers behave oddly if you try to impose the same encoding on both the HTML pages and the table of contents, especially in Japanese, in our experience.

Labels: , , , , ,

06 March 2007

How to pseudo-translate, Part I

Before you localize your software product, wouldn't you like to have an idea of what's going to break as a result?

If you've written it in English, it will surprise and alarm you to learn that that's no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:
  • text swell, in which "prompt" becomes "Eingabeausforderung" in German, for example, and the 40 pixels of width you've reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and 日本語 aren't in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.
In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn't be that way.

Internationalization testing is the process of pushing alien characters and situations down your software's throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It's not rocket science, but it doesn't happen on its own, either. And, you don't want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don't appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Labels: , , , , , , ,

07 December 2006

Localized Binaries - The Plot Thickens

The engineer has demonstrated that it is no longer possible to build just the resource binaries; it is now necessary to build the entire blinking product.

"Why is that?" I ask.

"We've improved the makefile," he replies. The makefile is a script used by the make command to build binaries.

"That doesn't feel like an improvement to me," I venture. "Why can't I just build the two or three resource binaries I need? I don't need all of the executables and other rot."

"Yes, well, we've improved the makefile."

"But there was a small, localized makefile that lived in each of the directories of the resource binaries I wanted. What happened to them?"

"We improved the main makefile by rolling all of those lower-level makefiles into it."

That's a hint to me that they improved it for the purpose of creating all of the files that go into the installer, but that's far and away more files than I want. It also means that it's probably going to take me a half-hour now to build binaries that used to take about six seconds each.

Had they been following good I18n hygiene, they'd have asked themselves (or me, even) whether there were costs associated with consolidating all of the lower-level makefiles and eliminating the possibility of rebuilding except in this huge batch. The costs don't really affect them that much, though they'll slow me down somewhat.

It's an "improvement."

Labels: , ,

27 November 2006

Pulling the rug out from under the Localization Manager

It's a thrill for the gearhead in me to build my own localized binaries.

Most projects in the wide world don't require this of the localization manager, of course. A staff engineer, or at least a release engineer, is usually tasked with building the binaries that house the localized software resources. There's some delay involved in that, though, since the engineers don't often place very high priority on building these infernal things, let alone building them as often as the localization QA cycles require.

The localizers are able to preview the localized resources in their localization environment (Alchemy, Visual Studio, etc.), but our engineers have an arcane build environment and procedure that I don't care to impose on even my least liked localization vendor, simply because it's an open invitation to failure. Instead, I persuaded the engineer who created the entire scheme to spend three hours duplicating the environment with me so that I could document it and reproduce it on a quick turnaround.

"Why are you going to all this trouble?" the engineer asked me.

"I'm trying to drive you crazy this one time so that I don't drive you crazy eight or nine times over the next few weeks. The translators will find new things to change as they continue localizing the rest of the product, and they'll change the resource files. If I have to bug you with each one of these changes, you may come to view localization as something, well, inconvenient."

"Good. Thanks for sparing me that."

That was for version 2.0.0 of this software. I was able to save precious days by doing my own builds and turning the binaries around to the localizers promptly. I also saved myself all of the credibility and Brownie points I'd have had to mortgage by running to the engineers all the time.

Now that we're localizing version 2.0.1, however, the procedure is changed. The engineers have pulled the rug out from under me, and nothing that used to work, works. Time to bug the engineer again and get the updated Rosetta Stone so that I can build these things.

Labels: , ,

30 September 2006

The Localization Consultant Amid His Buckets

After a few hours hunched over Beyond Compare, I've sorted the deltas between version 3.9 and version 4 into several buckets:
  1. New Content, based on filenames appearing for the first time in this version - 718 files
  2. Content Unchanged, except for the datestamp at the bottom of the page - 727 files
  3. Content Changed, but with changes that do not require translation (HTML tags, formatting) - 1517 files
  4. Other, including content with translatable changes and anything else - 319 files
My hope is that the vendor can hand off to the Japanese translators only those pages in which there is real translation work, then internally take care of #2 and #3 with search-and-replace and other engineering techniques to bring the 3.9 pages into parity with the 4.0 pages. For that matter, I could probably do the engineering myself, except that: 1) it's boring work; and 2) the vendor needs to update translation memory with the results.

We'll see how this goes. It doesn't help that the original English files have a lot of formatting errors in them, and that errors in the Perl scripts wipe out the content on several dozen pages and toss them into the CHM blank.

Labels: , ,

25 September 2006

Doing the Localization Vendor's Work?

Sometimes I know too much about this process.

Or, maybe I'm just too nice a guy.

To make things easier for the vendor (and cheaper for me) I've resolved to carve the 3200 HTML files in the API Reference CHM into different buckets, depending on whether and how much they require translation vs. engineering. Naturally, the ultimate arbiter is the Trados or SDLX analysis that the vendor will perform, but I've already mentioned my concern about false positives and need write no more on the topic here.

My tool of choice is the extremely capable Beyond Compare which, at US$30, is worth it just to see how well thought-out a software package it is. I compare version 3.9 files against version 4 files, tuning the comparison rules to groom the file buckets as accurately as possible.

The distribution is not perfect, if for no other reason than because its first level of triage is the filename and not the file contents, but it's better than guessing, and it's much better than thousands of false positives.

Once I've gone through the files, I'll have a better idea of how to label the buckets in a way that meets both my needs and those of the vendor.

At least, I think I'm being too nice a guy. Maybe this is just a big pain for the vendor, and they're too polite to inform me of that.

Labels: , , , ,