31 July 2008

International Keyboard Frenzy

My wife is traveling through Europe, sending us e-mail from Internet cafés along the way. Here's one I received this morning:

thnks for the msgs.  i luv zou and miss zou 9sorrz i hav a bratixlava kezboard0. will trz longer message in a few dazs. 
love, hugs and kisses.
She's actually a pretty good typist, but she was flummoxed by the keyboard on the computer she used in Bratislava, Slovakia, because several of the keys are in different places from where here fingers expected them to be on a U.S.-English keyboard. The interface between fingers and keys is a fragile one in computing. 
Of course, my wife could have tinkered with the Regional Settings control panel (Windows) or International system preference (MacOS) to disregard the hardware keyboard and interpret the keystrokes according to any other supported keyboard layout (like U.S.-English), but machines in Internet cafés are probably not set up to allow that kind of modification without administrative permission.
Thosé of üs whö frequently wríte with çharacterß from othër langüages not natively supported by our hardware need keyboard tricks to do so.
DOS
  • Are there any dinosaurs out there who remember how to do this besides me? To generate ü on a U.S.-English keyboard, for example, you had to hold down the left Alt key and enter 129 on the keypad. The left Alt key accessed the ASCII characters above 128.
  • I don't think Latin-based operating systems supported non-Roman characters; you had to either buy that version of the OS or get special software to add the functionality. (Who cares? It's ancient history.)
WINDOWS
  • U.S.-English users can use the U.S.-International keyboard layout to generate combined Latin characters like ëüöàñçß¿¡. I use it as my default mapping. It takes a bit of getting used to the change in how you use your quotation mark key -" ' - because you hit it before the key you want to accentuate. 
  • You can also Insert Symbol in most Windows applications, but this is clunky. 
  • For Asian and other non-Latin characters, or to map a different soft keyboard over your hardware keyboard, enable a different input language in the Regional Settings control panel. (This may require installing additional fonts in some exotic languages.)
MAC OS
  • Right out of the box, you can use the same keyboard tricks that have been in place since System 7. Option + e tells the OS that you want an accent aigu over the next character, such as e or a; option + u generates the diaeresis or umlaut over the next character, and other option + combinations result in other common accented Roman characters.
  • From the International system preference you can display a character palette in the desired language, then select the characters as you need them, or you can impose a software keyboard over your hardware keyboard. 
  • There's also full support for Asian and other non-Latin input methods, but again, you may need to install fonts (e.g., for Indic languages) from your original installer discs.
I have no doubt that these functions are elegantly handled in Unix/Linux variants as well, but I have the disadvantage of never spending time on them. Post a comment if you have useful tips on this.
How do you handle multilingual character input in your daily work?

Labels: , ,

29 May 2008

Localizing Robohelp Files - The Basics

We get a lot of search engine queries like "localize Robohelp file" and "translate help project." I'm pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project - .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don't have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you're the in-country partner of a company whose product needs localizing, and you've convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You're partially right: it's not impossible. In fact, it's even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you've finished translating, you'll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.
  • There may be a way to translate inside the compiled help file, but I wouldn't trust it. Fundamentally, it's necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don't have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It's part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you're still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.
We've marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what's involved.

If you liked this article have a look at "Localizing Robohelp Projects."

Labels: , , , , , , , ,

10 January 2008

SDL TMS or Idiom WorldServer?

Question from one of the subscribers to this blog:

"We are in the process of bringing on a workflow tool. In general, for software, DITA/XML, and Frame files, do you prefer working with the SDL Translation Management System, or Idiom WorldServer, or another program? I have my own ideas, but I'm always curious to hear the opinions of other localization professionals."

I recommended asking pointed questions to ensure the chosen vendor/solution:
  • doesn't lock you in to a particular LSP, or out of freelance translators who won't have the tool
  • manages the native file formats, without conversion
  • allows you to talk to internal technical leads (not just salespeople)
  • offers integration with your version control system, so that you're not manually moving files to and from your engineering repository.
Those of you with experience using these tools, kindly comment.

Labels: ,

16 November 2007

Where do your glossaries live?

The experienced project manager with your localization/translation vendor approaches a new client/project by asking you, "Has this ever been translated before?" Her big goal is to discover whether there's a translation memory database floating around, to help her translators do their work more quickly and keep your costs low, and her background goal is to find existing documents with key terms already translated and approved.

Smart companies maintain these key terms in a "glossary" or terminology list. Glossaries are far less comprehensive than translation memory because they serve a slightly different purpose: Instead of proposing a fuzzy-match translation for an entire sentence, they serve as a reference for the translators. Good translators know how to find translations for generally accepted terms like "closed-loop servomechanism" and "high-definition multimedia interface," but if the sales manager in your Shanghai office has already told you how he likes to see the word translated, everybody will be happier if that preference is observed.

So where do your glossaries live?

"Live" is the important word, because glossaries change and grow with time. Most glossaries I've seen are in a spreadsheet or word processing document. While that's better than nothing, it can suffer from decentralization, since updates don't always make it to everybody involved in the project, and some translators run the risk of using old terminology.

One of my more localization-savvy clients makes its glossary available on its partner portal, requiring a login and password. The php-based application, which is actually hosted by a translation vendor, allows searching in multiple languages. My client deliberately does not make the glossary available for download or export; this ensures that everybody is using the same version with all updates.

I like this model. The assets reside on the client/owner's site, and the terminology "lives" with the linguistic experts, who can easily modify it. It's a bit more work for the translator, who would rather have a flat-file document, but overall it serves linguistic interests well. It's tried-and-true technology built in to most computer-aided translation tools.

What are you doing with your glossaries?

Labels: , , , , , , ,

02 March 2007

Translation non-savings, Part II

Again I ask: How far will you go to improve your localization process? If a big improvement didn't save any obvious money, would your organization go for it?

I selected a sample of 180 files. In one set, I left all of the HTML tags and line-wrapping as they have been; in the other set, I pulled out raw, unwrapped text without HTML tags. My assumption was that the translation memory tools would find more matches in the raw, unwrapped text than in the formatted text.

I cannot yet figure out how or why - let alone what to do about it - but the matching rate dropped as a result of this experiment.























Original HTML Formatting and TagsUnwrapped, unformatted text
100% match and Repetitions65%51%
95-99% match9%14%
No match9%15%

This is, as they say in American comedy, a revoltin' development. It means that the anticipated savings in translation costs won't be there - though I suspect that the translators themselves will spend more time aligning and copy-pasting than they will translating - and that I'll have to demonstrate process improvement elsewhere. If I can find an elsewhere.


True, the localization vendor will probably spend less time in engineering and file preparation, but I think I need to demonstrate to my client an internal improvement - less work, less time, less annoyance - rather than an external one.

Labels: , , , , , ,

16 December 2006

Favorite Localization Tools

Here's a short list of Windows-based tools I use a great deal in managing localization projects:

Beyond Compare
- Clients constantly drill me about the differences between the last version of their product and this version, with an eye to the order of magnitude of localization expense they're in for. Beyond Compare is the best tool I've found for finding the files that have changed, then comparing older and newer versions of files in a specialized viewer. Good technical support as well.

EmEditor - As long as you have the font and OS support installed, you can view multi-byte characters in their appropriate applications under English-language Windows, but EmEditor allows you to change the encoding of a text file to better display it, or so that you can edit it. My standard text editor is Ultra-Edit, which has excellent search-and-replace capability, but it's not as deft as EmEditor for multibyte work on an English OS.

SDLX Glue - An obscure utility inside the SDLX suite, this will append up to I don't know how many hundred HTML files together. Translation vendors like it for work on big sites because it slashes the number of files being slung around. Naturally, it includes an unglue utility as well.

FAR - A technical writer introduced me to this utility, which includes a compiler system for HTML Help and MS Help. It will compile CHM files in any language such that, if you have a good HTML authoring tool, you don't need RoboHelp to build your CHMs. (Unfortunately, I've had problems when I've tried to use FAR on projects that have been created in RoboHelp, but there are some ways around them.)

Moreover, FAR stands for "Find And Replace", and this is hands down the best front end on regular expressions that I've ever found. The Holy Grail of search-and-replace is ignoring line breaks, and while regex supports that, not many utilities (that I've found) implement it. For instance, in the text

In a white room

with black curtains

at the station

if your goal was to find "room with black curtains at", most utilities would not be able to locate it because of the line breaks. FAR does find it, and even allows you to replace the text with line breaks. Top-flight technical support also.

Most of these are shareware, but they're well worth the US$25-$50.

(compiling CHMs, finding and replacing across line breaks)

Labels: , , , , , ,

25 September 2006

Doing the Localization Vendor's Work?

Sometimes I know too much about this process.

Or, maybe I'm just too nice a guy.

To make things easier for the vendor (and cheaper for me) I've resolved to carve the 3200 HTML files in the API Reference CHM into different buckets, depending on whether and how much they require translation vs. engineering. Naturally, the ultimate arbiter is the Trados or SDLX analysis that the vendor will perform, but I've already mentioned my concern about false positives and need write no more on the topic here.

My tool of choice is the extremely capable Beyond Compare which, at US$30, is worth it just to see how well thought-out a software package it is. I compare version 3.9 files against version 4 files, tuning the comparison rules to groom the file buckets as accurately as possible.

The distribution is not perfect, if for no other reason than because its first level of triage is the filename and not the file contents, but it's better than guessing, and it's much better than thousands of false positives.

Once I've gone through the files, I'll have a better idea of how to label the buckets in a way that meets both my needs and those of the vendor.

At least, I think I'm being too nice a guy. Maybe this is just a big pain for the vendor, and they're too polite to inform me of that.

Labels: , , , ,