04 December 2008

Localizing DITA Projects

Have you seen DITA projects land in your inbox yet? The full promise of XML is about to become your next headache.

If you don't know what DITA is, here's the thumbnail from the Open Toolkit's User Guide:
"DITA (Darwin Information Typing Architecture) is an XML-based, end-to-end architecture for authoring, producing, and delivering information (often called content) as discrete, typed topics."
In short, the source content you hand off for localization lives in XML files. If you get to the party soon enough, you can help your own cause by asking the authors to use specific XML tags in their authoring to make it easy for you to find text you need to translate and to ignore text you don't need to translate. The authors will surely fall all over themselves to make you happy with this new technology, so take advantage of it while it's still novel.

The problem with XML is that it's ugly and nobody can use it as documentation in that format, so it needs to be transformed into HTML, PDF, CHM, XHTML, or some other gestalt that people will use. The DITA Open Toolkit is an open-source means for performing this transformation, using scripts and languages to shape the content.

Your problem as a localization professional is not in the XML; it's in the transformation.

How do you know that the scripts your writers use for the source language (let's say, English) will work when you have to run them on XML files translated into Korean or Hebrew or Russian? (Well, they will run; the question is whether the result is good or garbage.)

With a kit like the Open Toolkit, things run as advertised when used right out of the box. The open-source project even devotes a chapter of its user guide to "Localizing (translating) your DITA content," and they are kind enough to provide pre-translated text like "Parent Topic," "Previous," "Next," which you can hook with the xml:lang attribute. The tricky part lies in the customization.

One Tech Pubs team engaged a team of script programmers to customize the toolkit. They've introduced strings like "Copyright Statement" and "Enter keyword" and placed a "Last updated" datestamp on every page in the help project. They've also implemented a search function (gulp!) so users can locate content in the help files. There's nothing wrong with this customization work, except that nobody was thinking of other languages while doing it. Now we're sorting out the location of the custom strings, the way to get the toolkit to format dates according to locale, and how to convince the search function that characters can take up more than one byte.

You will face the same problems. You'll need to internationalize your writers' customizations so that things work properly in your target language.

So when your writers tell you how much easier your life will be now that content is in XML, don't forget to look a bit further down the road at what they're using to transform that XML into something useful. That's where you'll put in the hours.

Labels: , , ,

0 Comments:

Post a Comment

<< Home