20 December 2006

Localization Conundrum

My client received a request from Korea for a localized version 6.5. There are two issues:

  1. It's going to cost a lot, because the last version localized into Ko was version 5.01.
  2. English is up to version 8, and the process of creating the help is much better than in 6.5 . Should we include those enhancements to 6.5 Ko, even though they would take it out of parity with 6.5 En?
I experimented to see whether the improvements mattered to the localization process in general and the cost in particular. I re-created portions of version 5.01 help using version 5.01 Perl scripts, then did portions of that same help using version 6.5 Perl scripts. Then I handed both sets off for wordcount analysis. They were within 2-3% of each other, so the cost-savings in translation are not there.

However, I suspected that the vendor would charge me a lot more for engineering on the 5.01 help, because the version 6.5 scripts are much cleaner, and they handle the raw text much better. This compelled me to examine the matter further.

Better help or not, the problem is one of product management. Even if 6.5 help is "better," it differs too much from 5.01 help. I imagine a Korean customer struggling to bounce back and forth between 5.01 En and Ko, and puzzling at the discrepancies, even though the Ko version had a lot more information than the En version.

They are the sort of discrepancies that make cowards of us all (albeit well advised cowards). I've decided to hand off the pure 5.01 En help system for this project, warts and all.

Labels: , ,

16 December 2006

Favorite Localization Tools

Here's a short list of Windows-based tools I use a great deal in managing localization projects:

Beyond Compare
- Clients constantly drill me about the differences between the last version of their product and this version, with an eye to the order of magnitude of localization expense they're in for. Beyond Compare is the best tool I've found for finding the files that have changed, then comparing older and newer versions of files in a specialized viewer. Good technical support as well.

EmEditor - As long as you have the font and OS support installed, you can view multi-byte characters in their appropriate applications under English-language Windows, but EmEditor allows you to change the encoding of a text file to better display it, or so that you can edit it. My standard text editor is Ultra-Edit, which has excellent search-and-replace capability, but it's not as deft as EmEditor for multibyte work on an English OS.

SDLX Glue - An obscure utility inside the SDLX suite, this will append up to I don't know how many hundred HTML files together. Translation vendors like it for work on big sites because it slashes the number of files being slung around. Naturally, it includes an unglue utility as well.

FAR - A technical writer introduced me to this utility, which includes a compiler system for HTML Help and MS Help. It will compile CHM files in any language such that, if you have a good HTML authoring tool, you don't need RoboHelp to build your CHMs. (Unfortunately, I've had problems when I've tried to use FAR on projects that have been created in RoboHelp, but there are some ways around them.)

Moreover, FAR stands for "Find And Replace", and this is hands down the best front end on regular expressions that I've ever found. The Holy Grail of search-and-replace is ignoring line breaks, and while regex supports that, not many utilities (that I've found) implement it. For instance, in the text

In a white room

with black curtains

at the station

if your goal was to find "room with black curtains at", most utilities would not be able to locate it because of the line breaks. FAR does find it, and even allows you to replace the text with line breaks. Top-flight technical support also.

Most of these are shareware, but they're well worth the US$25-$50.

(compiling CHMs, finding and replacing across line breaks)

Labels: , , , , , ,

07 December 2006

Localized Binaries - The Plot Thickens

The engineer has demonstrated that it is no longer possible to build just the resource binaries; it is now necessary to build the entire blinking product.

"Why is that?" I ask.

"We've improved the makefile," he replies. The makefile is a script used by the make command to build binaries.

"That doesn't feel like an improvement to me," I venture. "Why can't I just build the two or three resource binaries I need? I don't need all of the executables and other rot."

"Yes, well, we've improved the makefile."

"But there was a small, localized makefile that lived in each of the directories of the resource binaries I wanted. What happened to them?"

"We improved the main makefile by rolling all of those lower-level makefiles into it."

That's a hint to me that they improved it for the purpose of creating all of the files that go into the installer, but that's far and away more files than I want. It also means that it's probably going to take me a half-hour now to build binaries that used to take about six seconds each.

Had they been following good I18n hygiene, they'd have asked themselves (or me, even) whether there were costs associated with consolidating all of the lower-level makefiles and eliminating the possibility of rebuilding except in this huge batch. The costs don't really affect them that much, though they'll slow me down somewhat.

It's an "improvement."

Labels: , ,