Perspective on XML: XML circles the globe -- ADTmag

Perspective on XML: XML circles the globe

By Uche Ogbuji
August 1, 2004

Developers are an ambitious bunch. The early rallying cry for Linux began with the words “world domination.” So how do software developers, whether community groups or corporate coders, go about such a conquest? Perhaps one can take a lesson from the myth of the Tower of Babel, whose story is so widespread across cultures that some either think it’s true or an apt symbol of a basic pillar of the human psyche.

The ancient lessons set up around this story involved the vanity and inherent limitations of human nature. Lessons drawn in the modern era might be quite different. Perhaps we shall never actually colonize space until the current trend of disappearing languages leads to the supremacy of one global super-language (a creepy and terrifying thought). If our diversity of tongue and culture will never allow us to challenge divinity, the more modest goal of world domination is probably impossible without managing the problem of Babel. When developing software for a global audience, managing this problem is called internationalization (“i18n” in the clever abbreviation).

I never thought much about i18n until I worked as a consultant at IBM, which is legendary for its attention to global deployment of technology. I’ve always been one to pick up languages and adapt to cultures quickly, so I was surprised at how extensively i18n considerations reshaped my most basic development habits. i18n is not just a matter of ditching knee-jerk biases (although many in the U.S. could start with that basic step); it requires support from tools and frameworks and special attention in all development methodologies.

As with all important development problems, the key to success or failure lies in the data. i18n is a non-starter without data that supports (almost) all writing systems, text translations and structured fields rich enough to accommodate differing local conventions. That’s one reason why the success of XML is so encouraging. XML was born with i18n in its genes, and achieving the data characteristics I mentioned for i18n is often a matter of good XML design. The basic guidelines are simple enough to state, although they take some attention to get right.

Don’t undermine XML’s character model. One of the most misunderstood aspects of XML is its basis in Unicode. I can understand that because Unicode is a tough subject for people who are used to thinking of strings from a European language point of view (and even for many used to more complex character models). But don’t even think of using XML without understanding its concept of text. And if XML’s text model is too much for you, forget entirely about developing any software for a global community or market because any other i18n mechanism will involve at least as many complexities.

The most common problem I come across with regard to XML and Unicode is when applications extract data from XML text into data structures that can’t handle the complexity of Unicode (e.g., simple strings in C). All seems well in testing because the testers only use ASCII or European language test data. Then the software is deployed and a Chinese user enters input that causes a failure. Even more pernicious is the occasional error propagated from a source that might be considered authoritative by developers. I recently examined a fairly well-known XML tool whose default configuration does not allow characters from non-European languages. As such, this is not even an XML tool at all -- so blatant is its non-compliance -- but it is advertised as an XML tool and unwary users may not appreciate its limitations.

The XML-RPC specification is another similar case. XML-RPC is a fairly popular protocol for exchanging XML data over HTTP, but its specification makes the fairly ridiculous stipulation that all strings sent by way of XML-RPC must be ASCII. Luckily, most XML-RPC implementations ignore this limitation and allow the full range of Unicode characters to be sent, but such scorn of non-English users in an XML-based specification causes a great deal of confusion.

Another recommendation, and a much harder one to follow, is to ensure that your software handles translated versions of text. There is a standard attribute, xml:lang, which allows the articulation of multiple instances of an element’s content, each in a different language. Be sure not to block this usage (e.g., through schema constraints) and be sure your tools respond intelligently so that, say, a Hebrew-speaking user would be presented with text that has been translated into Hebrew where available. XPath, the most important little language for XML processing, does provide some support for this.

One last thing I recommend is to design structures and conventions in your XML to accommodate varying cultural norms. Internationally respected specifications are a good source for such structures and conventions.

As an example, be careful when modeling people’s names to accommodate the fact that some cultures prefer to display or sort by given name, others by family name, or that additional names and titles are essential in some cultures. Docbook is a good specification to emulate in this regard. Other examples include dates (use the ISO-8601 standard rather than, say, DD/MM/YYYY), numerals (be aware that different countries use commas and periods in different ways within numerals), currency, addresses and telephone numbers.

There is a great deal of work to i18n, and it is hardly enough to take proper advantage of the facilities in XML, but it is a first step toward managing the Babel problem. I’ll leave to the shamans the question of whether this will actually lead to world domination of your software products.

About the Author

Uche Ogbuji is a consultant and co-founder at Fourthought Inc. in Boulder, Colo. He may be contacted at [email protected].

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 4-Hour In-Depth Workshop: Deep Dive into ASP.NET Core Razor Pages
May 29, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library