27 July 2007

Virtual machines as localization testbenches

What are you doing for localized testbenches? Are you still partitioning hard drives or, worse yet, dedicating entire machines to one language-platform? Lab getting a bit crowded and hot, is it? Consider using virtual machines, or VMs.

Microsoft's Virtual PC 2007 is free and uncrippled, so you can create, run and administer your own VMs. It's not bad software, and you can be sure that if there are any "special" tricks a VM should know about the Windows-version hosting it, this product will know them. VMWare's product is not free, but features a VM player, so if somebody in your department has the full product and can create VMs for you, you can use them as you would a normal drive.

A VM is just a huge file (around 1GB for Windows 2000, 3GB for Windows XP, 6GB for Vista) that you mount and run as a "guest" session in its own window. It's like having a computer inside a computer, although it takes away drive space, RAM and processor cycles that the "host" system - the one your computer runs normally - used to use. You can start and stop the VMs as you need them, and you can install almost any OS or language to run as a guest inside the VM; however, you do need to procure a legal copy of that OS/language combination.

Tiring of so much ancient kit lying around the lab, we've begun to migrate to VMs. They're not a panacea, but they make things like remote testing a good deal easier, they require less hardware, and they make dual-boot configurations irrelevant. There's quite a performance hit, unfortunately, and we're finding that VMWare VMs are a bit more responsive than Microsoft's.
However, most localization testing is focused on UI and functionality rather than on performance, so this may not affect your lab unduly.

Creating one is not that difficult. Here's an example for Microsoft's Virtual PC 2007:
  1. Obtain a machine with about 2GB of RAM and at least 80GB of drive space, running, say Windows XP English.
  2. Download and install MS Virtual PC 2007.
  3. Create a virtual machine, specifying Windows XP. For installation, give the VM 1GB of RAM; you can reduce that amount later if need be.
  4. Obtain the installation disk for the desired OS (e.g., Windows XP Japanese) and place it in the drive.
  5. Start the VM. As it opens in its new window, specify that you want to capture the CD drive (or the image file on the host machine, if you're mounting a .img).
  6. Installation will then take place as normal, with disk checking, file copying, and all the configuration and rebooting you would expect of an installation on a physical drive.
  7. A few GB later, you have a WinXP Japanese VM running as a guest on your Win XP English host. Install the Virtual Disk Additions to enable features like host-to-guest drag-and-drop.
Of note:
  1. Your VMs don't inherit domain information from the host machine, so if you want the VMs on the domain for things like advertised programs and SMS pushes, you'll need to arrange that separately with the network administrator.
  2. They just get bigger. There is a feature to "compact" the VMs, but the resulting file takes no less space on disk.
  3. Migrating from existing physical drives into a VM is a crapshoot. Our most wildly successful experimentation has resulted in Blue Screen of Death, so don't expect to take an existing testbench and copy it into a new VM, any more than you would expect it to work going from a desktop to a laptop physical machine.
  4. There's an emulation layer between the VM and the hardware, so peripherals (USB devices, dongles) may not run the same way as they do on a physical drive.
  5. VMs are portable, so if you can get one to run on your desktop, you should be able to copy it to other computers and use it on them. For that matter, VMWare VMs can be hosted on servers and run remotely.

Labels: , , , ,

20 July 2007

Machine translation in action

Has your boss asked you to use Google or AltaVista or some other flavor of machine translation to lower your translation costs?

Here's somebody who has put his money where your boss' mouth is.

Controlled language website attracts visitors from 110 countries

www.muegge.cc, a website dedicated to demonstrating the value of controlled language authoring and machine translation (MT), has attracted visitors from more than 110 countries since its launch in the summer of 2006. One of the unique features of this website is the fact that it uses Google language tools to automatically translate the site's content into 15 language pairs such as German to English or English to Simplified Chinese. The website was created from the ground up for MT, and all text was written in compliance with the CLOUT rule set, a controlled language designed specifically for MT.

muegge.cc, E-mail: info@muegge.cc, Web: http://www.muegge.cc

How do they do it? By controlling the text that goes into the translation machine. The simpler, more predictable and better structured the text, the more likely it can generate a satisfactory translation. In other words, machine translation would probably work better on a page of Hemingway than on a page of Shakespeare or Faulkner.

Don't forget, though: What you save in translation, you'll spend in whipping your writers into line. It may not look like real dollars, but it's time.

And time, as they say, is money.

Labels: , ,

13 July 2007

Where Translation Memory Goes to Die

Have you ever heard that you're better off not going into the kitchen at your favorite restaurant? You're likely to see a number of things you'd rather not associate with a place and a group of people you like.

The same may apply to your translation memory databases. Unfortunately, you don't have the luxury of ignoring them, because things could be dying in there and costing you money.

Let's start with this sentence:

Some interfaces use "redial function/redial context" semantics instead of using IRedial to specify both.

Any TM tool could store this string and its translation without problems. Suppose, though, that the sentence (segment, in TM terms) only looks contiguous when displayed in an HTML browser, which is a very forgiving viewer, and that the source is actually broken into three pieces:

1. Some interfaces use "redial function/redial context" semantics instead of using
2.
to specify both.
3.[HTML tags] IRedial.htm [closing HTML tags] IRedial

The text comes from include files written by engineers for engineers, and no line is longer than 80 characters. The tags come from the well-intentioned Tech Pubs team, which struggles to introduce some organization, hyperlinking and search capability to the product. This is pretty bruising to TM, which relies on being able to fuzzily match new occurrences to old occurrences of similar text. When the full sentence comes through the TM tool, its correspondence to the three broken fragments in TM is sharply impaired, and you (or I, in this case) pay for it.

It gets worse. If an engineer pushes words from one line to the next between versions, or if the tags are modified, the impact on match-rates is similarly impaired.

I've huddled with engineers, Tech Pubs and the localization house on this matter several times, with little progress to show for it, but here's a new twist:

We've offshored one of these projects to a vendor in China. Their solution was to re-align ALL of the English-language HTML pages from the previous version to ALL of the translated HTML pages of the previous version, effectively re-creating TM. They report about 20% higher match rates after doing this. I think this is because they're embracing the broken, dead segments in TM and finding them in the source files for the new version.

This seems like a counterintuitive approach, but who can argue with the benefits?

Labels: , , , , ,

06 July 2007

Localization Management: One Big Bag, or Several Smaller Ones?

Are you getting to the point where you need to consider decentralizing your localization project management efforts? That decision point usually comes with growth (more languages, more products, shorter lag-times). Here are some things to consider in distributed vs. centralized localization management models:
  • Trying to plant and cultivate this expertise in each group (QA, product teams, release engineering, sustaining engineering) is pretty tough. One prospect appeared to manage things in a decentralized manner like this, telling me that "the car drives itself," but I have my doubts. I've never seen any organization that did it well and robustly without a lightning rod or "localization czar" that ran around pestering people company-wide.
  • Frankly, I think decentralization is difficult because, for all but the largest, best funded firms, it's just plain hard to find and keep that many individuals interested in driving international products. I suppose it could be built into the company's incentive structure, so that managers understood it was part of their charter to localize, but that wouldn't guarantee that they would actually care about it, and caring is at the heart of long-term localization management.
  • Until somebody really cares about it, most internationalization/localization projects have an out-of-band feel to them. It takes a long time before they feel familiar, and the sooner everybody knows that there's somebody (besides them) responsible for putting out the Japanese version of the product or Web site, the sooner everybody can get back to work.
For these reasons - and a few others - I counsel companies to dedicate a single project manager , preferably with domain expertise, to as many localization projects as practical, rather than to expect each project manager to handle the localization aspects of his/her own projects.

That model will last a good while in most companies.

Labels: , ,