29 May 2008

Localizing Robohelp Files - The Basics

We get a lot of search engine queries like "localize Robohelp file" and "translate help project." I'm pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project - .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don't have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you're the in-country partner of a company whose product needs localizing, and you've convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You're partially right: it's not impossible. In fact, it's even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you've finished translating, you'll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.
  • There may be a way to translate inside the compiled help file, but I wouldn't trust it. Fundamentally, it's necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don't have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It's part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you're still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.
We've marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what's involved.

If you liked this article have a look at "Localizing Robohelp Projects."

Labels: , , , , , , , ,

05 October 2007

"Why are you charging me for that?" - Part 1

Have you ever asked your localization vendor this question? Or, if you're a vendor, has any client ever asked it of you?

For a few clients, we manage large documentation projects, notably HTML Help and Robohelp localization. When the vendor translated 800 HTML pages for version 1.0 of the product, a particular client swallowed hard and paid for all non-matches, because it was the first time localizing the product.

By version 2.0, the Help had grown to 1400 pages. Many of the original 800 pages had no translatable changes, but Trados dutifully scooped up all of those words, dropped them into the "100%" or "95-99%" buckets, and the vendor charged us for them, even if at a greatly discounted rate.

"Why are you charging me for that?" I asked. I'll have more on this topic in an upcoming post, but for now:

If you're on the vendor-side, do you have a good answer for that question? If you're on the client-side, have you ever received an answer to that question that satisfied you?

Labels: , , ,

10 August 2007

Localization - Top 5 Web searches

What is your most frequently used Web search regarding localization? Are there search phrases you check every now and again to see what new results they yield?

Over the last couple of years, I've tuned the keywords on this blog and on our Web site, www.1-for-all.com, for both pay-per-click and search engine optimization. I have a pretty good idea of which search topics bring people to this blog, and here are the top five topics, with my comments:
  1. Localization of HTML help projects (Robohelp, CHM, etc.). I can't tell whether people have trouble with this, or whether they're poking around to find out whether they are going to have trouble with it once they undertake it. My hunch is that Robohelp, the dominant product for creating HTML help, either doesn't do a good job creating localized help systems or doesn't do a good job in explaining how to create them. Our experience has been that double-byte localization requires a specifically enabled, separate version of Robohelp, which strikes me as silly, but perhaps Adobe has addressed this by now.
  2. How much to charge/pay for translation. Everybody wants to know this. Responding to the frequency of these queries, I wrote an article called "Going Global Without Going Broke" to help people who want a few benchmark figures from which to cobble together a budget. If you're any further along than that, you should just contact a vendor, push your files to him and get an estimate. If you're a translator or want to become one, phone a localization company, tell them what you can do and find out what they'll pay you.
  3. Localization project manager/management. I would guess that about half of these are vendors (a.k.a localization service providers, or LSPs) and half are companies with localization needs to fill.
  4. Localization jobs. Most of these queries come from Ireland. There's a relatively high concentration of localization talent in that country, and perhaps a high rate of turnover as well.
  5. What is localization? Again, the frequency of these queries prompted me to write articles called "Opening the Black Box." I'm glad to see people asking this question, because it demonstrates continuing and continuing interest in this specialty. At the same time, however, I notice that some of these queries come from China and India, suggesting to me that the IT shop which has just promised you it can localize your software for one-seventh the price you've gotten from other vendors, is now trying to figure out what's involved in fulfilling that promise.
At the other end of this list are the searches we're not seeing: questions we believe people should be asking but aren't.

Labels: , , , , ,

02 March 2007

Translation non-savings, Part II

Again I ask: How far will you go to improve your localization process? If a big improvement didn't save any obvious money, would your organization go for it?

I selected a sample of 180 files. In one set, I left all of the HTML tags and line-wrapping as they have been; in the other set, I pulled out raw, unwrapped text without HTML tags. My assumption was that the translation memory tools would find more matches in the raw, unwrapped text than in the formatted text.

I cannot yet figure out how or why - let alone what to do about it - but the matching rate dropped as a result of this experiment.























Original HTML Formatting and TagsUnwrapped, unformatted text
100% match and Repetitions65%51%
95-99% match9%14%
No match9%15%

This is, as they say in American comedy, a revoltin' development. It means that the anticipated savings in translation costs won't be there - though I suspect that the translators themselves will spend more time aligning and copy-pasting than they will translating - and that I'll have to demonstrate process improvement elsewhere. If I can find an elsewhere.


True, the localization vendor will probably spend less time in engineering and file preparation, but I think I need to demonstrate to my client an internal improvement - less work, less time, less annoyance - rather than an external one.

Labels: , , , , , ,

26 February 2007

Translation non-savings, Part I

How far will you go to improve your localization process?

Because of how localization is viewed in many companies, the best improvements are the ones that lower cost. Low cost helps keep localization inconspicuous, which is how most companies want it.

But if a big improvement didn't save any obvious money, would your organization go for it?

Elsewhere in this blog I relate the saga of the compiled help file with 3500+ HTML pages in it. These pages come from a series of Perl scripts that we run on the header files to extract all of the information about the product's API and wrap it up in a single, indexed, searchable CHM. In a variety of experiments, we've sought to move the focus of translation from the final HTML files to a point further upstream, at or near the header files themselves. If the raw content were translated, we believe, all downstream changes in the Perl scripts, which get revised quite often, would be imposed automatically on the localized CHM.

One of the biggest cost items - we have suspected - is due to changes in line wrapping and other HTML variations that confuse TM tools into thinking that matches are fuzzier than they really are. The false positives look like untranslated words when analyzed, so the wordcounts rise, and not in our favor.

"If we work with raw text, before HTML formatting," our thinking goes, "the match rate will rise."

Not.

I'll describe my experiment shortly.

Labels: , , , , ,

30 January 2007

Localization Train slowing

We're seeing the localization juggernaut lose some steam.

In the early years, this client localized its flagship software package for developers in China, Japan and Korea (CJK), then added Brazil. It took small, reference applications into as many as 10 languages (including Hebrew and Thai) as those markets showed promise. The budget was pretty fat, the localized products were freshened frequently, and the developers were happy to have software and doc in their own language.

I suppose it was to be expected that this would peter out with time, because markets change, business cases wax and wane, and some regions never return the investment.

The new stressor on localization was less easy to anticipate: bulk. Each generation of improvements to the product brings several hundred more pages of documentation. All of this new documentation is, of course, "free" in English, but somebody has to pull out a checkbook to deal with it in other languages, and that checkbook comes out more slowly and with more misgivings these days.

Engineering and Product Management furrow their brow nowadays when I walk in with cost estimates. I've adapted to this change in attitude with a few techniques:
  1. The Technical Reference is the fattest target and the source of most of the expansion. It lives in a compiled help file (CHM) that is no longer written by Tech Pubs, but generated by Perl scripts from header files written by the engineers. Our modus localizandi has been to hand off the finished help project, now comprising 3700 HTML files, and have the HTML translated. In an effort to lower cost, I'm attempting a proof-of-concept to localize the header files themselves, then tune the scripts to convert them into localized HTML. This should lower our localization engineering costs considerably.
  2. I agitate for interim localization updates, peeling off documentation deltas every few weeks and handing them off for translation, even if there are no plans to release them yet. This reduces the sticker shock and time-to-market delay that comes of getting an estimate on a release only when necessary, which may be a 10- to 18-month interval. Product Management and Engineering, who only think about localization when it's absolutely unavoidable, find the tsunami of untranslated text depressing.
  3. Although it's not a very clean way of doing things, I screen from the localization handoff those items that I know have little to be translated. Sometimes I go to the level of resource files, but more often I take documents to which only a few minor changes have been made from one En version to the next, hand off changed text, then place the translations myself. This is not for the faint of heart, nor for those who don't really know the languages involved, but it can save some money.
  4. I try to keep global plates spinning, in the hope that more people will consider the global dimension of what we do, and the fact that localization is the necessary step for making your product acceptable to people whose use of your product will make you money, if you make it easy for them.
  5. I never impart bad news on Friday.

Labels: , , , , ,