Does XML give away the keys to the warehouse?

Minimization is an important aspect of security. In the supply chain of goods businesses, one doesn’t lump all the raw materials, financial instruments and other paraphernalia of commerce along with the final goods in the retail location.

Besides other obvious problems, security would become impractical. Even the most super superstore limits its contents to material that would normally be expected to leave the premises in typical commerce.

For some reason, businesses have a hard time applying such a simple, pragmatic approach to data management.

If you’re a developer or manager on a project channeling data to, say, a commerce Web page, data security probably keeps you up nights.

One infamous vulnerability behind such nightmares is SQL injection. The attacker fills in normal commerce forms with cleverly constructed strings designed to trick the database into leaking sensitive information.

Recent analysis reveals how readily attackers can compromise entire databases in this way. When the Web site is but a thin layer over a super-sized enterprise database (an all too common setup), the degree of vulnerability is staggering.

Accumulating data into ever bigger databases might make some aspects of development and management easier, but it makes things just as pleasant for the attacker.

Recently the specter of malicious input has come to the XMLworld. The so-called XPath injection attack is aimed at corporations that pack loads of information into XML files, which are then processed for the Web by XSLT or other XPath-based technologies. Again, clever input fools the processor into returning more data than designed.

While I don’t claim to have foreseen XPath injection attacks, it does strike me that this security problem is made possible by practices that I and others have always discouraged. One problem is the phenomenon of production XML as database dump. Developers love to create titanic XMLfiles, often as monolithic dumps from databases. Sometimes they deploy such monsters to servers susceptible to the cleverness of attackers.

If someone does compromise the server, they can pilfer one file and have your information warehouse at their hands.

Some vendors suggest encryption as a way to secure XML data, but the XPath injection illustrates how this is but a cosmetic fix on a foundational problem. If XPath injection fools the processor into returning sensitive data, it will be conveniently decrypted for the attacker. With a clever enough XPath injection, they could end up with the entire file decrypted for them.

Does this mean that using XML automatically gives black hats keys to your sensitive data? Of course not. Paradoxically, the solution is to embrace the fact that XMLis open data, and don’t cower behind the false bastion of obscurity. All data accessible through XPath at any time should be data you expect any party to be able to access, including attackers.

From an architecture point of view, this is a strong hint toward pipeline architecture for your XML applications. The idea behind XML pipelines is that rather than working monolithic XML data sets through monolithic processors, you break down your system into discrete stages, each of which only represents a small window into the overall data stream.

That way you can firewall sensitive data across pipeline stages so it’s impossible for any action within one stage (even actions as clever as XPath injection) to access more data than designed.

Astrophysics has the concept of a light cone, the portion of space-time open to observation. Outside the light cone, the observer is blacked out by the heftiest firewall in the universe: the speed of light. You should design your pipelines so that as much sensitive data as possible is outside the light cone of each processing stage.

About the Author

Uche Ogbuji is a consultant and co-founder at Fourthought Inc. in Boulder, Colo. He may be contacted at [email protected].