29 May 2008

Localizing Robohelp Files - The Basics

We get a lot of search engine queries like "localize Robohelp file" and "translate help project." I'm pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project - .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don't have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you're the in-country partner of a company whose product needs localizing, and you've convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You're partially right: it's not impossible. In fact, it's even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you've finished translating, you'll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.
  • There may be a way to translate inside the compiled help file, but I wouldn't trust it. Fundamentally, it's necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don't have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It's part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you're still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.
We've marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what's involved.

If you liked this article have a look at "Localizing Robohelp Projects."

Labels: , , , , , , , ,

22 May 2008

If it isn't broken...break it!

What's the most effective way to bump up your translation costs unnecessarily?

Probably by localizing something that nobody will ever want in a foreign language, of course. But nobody would ever approve an expense like that, so it wouldn't have the opportunity to affect your translation costs.

There's a much sneakier, more pernicious way of wasting translation money: Tinkering with the original text (for example, English).

Suppose you localized your product or documentation from 2002 through 2007. You'd have five years' worth of translation memory (TM) economies and glossary entries going for you, with thousands of exactly matched words that incurred no translation cost from one version to the next. Then suppose that someone decided in 2008 to go in and "clean up" the original English text to make it more "readable" or "user-friendly."

What do you think would happen the next time you handed off this content for TM analysis? Suddenly, non-matches would pop up where exact matches used to be. Among the causes:
  • Combining short sentences together
  • Breaking long sentences apart
  • Making stylistic changes to common terms (e.g., changing "phone" to "telephone" or "handset")
  • Standardizing disparate terms (e.g., selecting one of "Proceed as follows," "Perform the following steps," "Following is the required procedure" and propagating throughout the documentation)
  • Typographical or grammatical corrections
You might tolerate these modifications in the interest of improving your product in all languages - not just English - but the sad truth is that you may find that they make no difference in the localized products. You'd pay for words that the translator did not need to touch. This is an unfortunate artifact of the way in which translation jobs are estimated, but the analysis software cannot predict that the changes will make no difference to the translation; only the translator sees that.

Note that re-organizing content should not cost you additional translation money; as long as the sentence is the same (i.e., an exact match), it doesn't matter where it's located in the product.

So, are you better off leaving errors and other undesirables in your original-language content? No. It would be a mistake to let concern for translation cost impede your product improvement effort, like having the tail wag the dog. Still, to the extent you can control it, you should try to avoid purely stylistic changes that make no difference in how your customers use your product. A good editor can make a hundred such changes per hour, not realizing the ramifications on translation costs.

If you learned something from this post, you might like to read Improved Docs through Localization or Getting the Writers to Care about Localized Documents.

Labels: , , , ,

15 May 2008

Doxygen and localization

Are you localizing any documentation projects that use Doxygen? It's an open-source tool for documenting source code.

If your documentation set includes things like an API reference or extensive details in programming code, Doxygen allows you to embed tags in the original code or header files, then automatically create entire help systems organized around the tagged text. Doxygen does not compile anything, but takes the tagged bits of source files, turns them into HTML pages, then links them for viewing in a browser.

Like most tools, it's a breath of fresh air when it works properly, but it can require a lot of re-plumbing and retrofitting.

As far as localization goes, it can be a life-saver. In theory, you can have the header files themselves localized, then run them through Doxygen as you would the original English files. Working this far upstream can be a big advantage.

Some months ago a client embarked on a conversion of a help system to Doxygen. While it was still in the proof-of-concept stage, we pseudo-translated some header files and tested the tool for global-readiness.

The good news is that the developers of Doxygen have enabled it for multiple languages. It encodes pages in UTF-8 (or other character sets), so translated text displays properly in the browser. It's possible to set the OUTPUT_LANGUAGE parameter to your target language (e.g., Japanese, in our test scenario) so that the datestamp and other text supplied by Doxygen displays in Japanese, rather than in the default English.

There are some I18n problems with Doxygen, though.
  • Each header file page begins with "Collaboration diagram for" followed by the page title. When the page title contains double-byte characters, the Japanese characters for "Collaboration diagram for" are corrupted. It appears that Doxygen is not pushing UTF-8 characters for this phrase, though it pushes UTF-8 characters in other places.
  • Some hyperlinked words in body text will require translation. If so, it will be important to ensure that they are translated the same everywhere. Note, however, that Doxygen will not generate the necessary file if the hyperlink has double-byte characters in it (not even on a Japanese OS).
  • Doxygen allows for generation of the .hhc, .hhp and .hhk files needed for Compiled HTML Help (CHM). It can also be configured to execute hhc.exe and compile the project. However, Doxygen outputs the .hhc file in UTF-8 format, which is incompatible with the table of contents pane in the Help viewer. To fix this, open the .hhc in Notepad (preferably on a Japanese OS) and save it back out as Shift-JIS ("ANSI" in Japanese Notepad). Then recompile the CHM by invoking hhc.exe from the command line and the contents will show up properly.
  • Searches using single- or double-byte characters do not work in the resulting CHM.
These strike me as rather large, empty boxes on the checklist of global-readiness. Still, the source code is available, so if your organization has already started down the Doxygen path, you can clean up problems like these for your worldwide versions.

Interested in this topic? You might enjoy another article I've written called Localizing Robohelp Projects.

Labels: , , , ,

08 May 2008

ISDN (I still don't know) about Localization

"There are still a zillion people who don't know about localization," the sales representative of the localization company told me. "Can you believe it? After all these years?"

Yes, I suppose I can. We can make sales calls and deliver presentations on the most efficient ways to localize until we're all ready to retire, and there will still be executives, companies and entire industries that haven't gotten the memo.

It's refreshing in some ways, and it keeps us from getting lazy. It reminds me of the ISDN craze around Internet access back in the mid-1990's, before cable and DSL made our choices simple (at least in the USA).

ISDN, or Integrated Services Digital Network, was a high-speed alternative to dial-up, but the phone companies were not very successful in taking the service from the early adopters to the early majority. The acronym became redefined as "I still don't know," because most people couldn't understand the service well enough (or afford it, for that matter) to see how it would benefit them.

The upside: There are still, and will be for a long time, opportunities to sell translation and localization services. As soon as all of our customers know about localizing products and how to do it efficiently, they'll turn to The Next Thing, such as John Yunker's Web Globalization Report Card threshold of localizing the Web site into 20 languages. We won't run out of work, provided we stay a few steps ahead of our customers' requests.

The downside: We may spend a little less time educating new clients, but we're not completely out of the hand-holding business yet. Salespeople will still need to update their presentations and drag an engineer or project manager to that second-round meeting with the prospective client.

Just be sure to stay on top of localization developments and techniques so that you don't have to answer a prospect's question with "ISDN" (I still don't know).

Labels: , ,

01 May 2008

Web Localization and the Cobbler's Children

"Why don't we have our Web site localized?" my business partner asked. "We're in the business, and a localized site would show that we're willing to put our money where our mouth is."

Excellent question. Why not get our site, or at least the pages that pertain to localization, localized? So I looked into it.

It was going to cost about US$2000 per language, when all was said and done, so I asked my partner if he'd be willing to split the cost with me. Perhaps you can guess his answer.

It was an interesting issue, though. Assume that a prospective customer, who doesn't know much about the industry, goes shopping for a vendor. She finds a vendor whose site is in only one language, and another whose site is in eight languages. Which vendor has more credibility, especially to somebody who doesn't know (or even want to know) a lot about localization?

Mind you, I'm not completely representative of the entire industry. I'm not a "language service provider," so that bit of credibility is of no great advantage to me. Still, it brings up the old chestnut about the cobbler's children running barefoot: Isn't it odd to be in localization, yet not have a localized Web presence?

My rationale, aside from the expense, is that almost nobody who would want our services would want to read about them in any other language besides English. That's probably the case for almost everyone in the American localization industry, where the dominant language conveniently matches the world's current lingua franca. Other languages just confuse most Americans anyway, so one could argue that it would be a needless distraction in the sales cycle.

What do you think your customers and prospects want to see? Can you get by with your marketing presence (Web, collateral, datasheets) in one language?

If you enjoyed this article, have a look at "Why Localize at All?"

Labels: , , ,