Thoughts on the Office XML Reference Schemas

Earlier this week, Microsoft popped out a rather unexpected press release: "Microsoft Announces Availability of Open and Royalty-Free License for Office 2003 XML Reference Schemas". Read the press release, and you'll discover that Microsoft worked with the Danish government to make these schemas publicly available. The Danes put out their own press release on the subject. Apparently the initiative was driven by The Danish Software Strategy. That document makes fascinating reading on its own right. The Danish government is charting a middle course between using proprietary and open source software, emphasizing XML and other open standards rather than political correctness for the software itself. This strikes me as eminently sensible.

Anyhow, back to the XML schemas. As you might already know, Word 2003, Excel 2003, and InfoPath 2003 can save documents as XML (InfoPath natively, Word and Excel if you choose to save as XML). This means that any XML generating or consuming tool can at least theoretically interoperate with Word and Excel. Indeed, it's been possible to do this just by saving Word or Excel documents as XML and then inspecting the results. What the new Microsoft Office 2003 XML Reference Schemas download adds to the picture is documentation. Right now the download contains information on WordProcessingML (which was formerly called WordML in some Microsoft documents); the Excel and InfoPath schemas are due to follow on December 5.

Download and install the schemas (a process that was slightly annoying here, since the installer is hard-coded to use drive c: and my development box doesn't have a drive c:) and you'll get the applicable XML schemas, a help file that documents everything, and a Word document that explains some common scenarios. It really is everything you need to build up XML files that Word will see as perfectly-valid Word documents, complete with the schemas that you need to validate your work.

Yet all may not be quite rosy in the land of openness. There are two potentially troubling things in the legalese that comes with the schemas (and please bear in mind that I'm a developer, not a lawyer; I would welcome corrections or clarifications from anyone who knows better). First there's the matter of the associated Office 2003 XML Reference Schema Patent License. The schema download itself contains language that lets you copy and distribute the schema, subject to certain limitations (mostly that you need to properly credit it and link to a particular page at Microsoft). But the download doesn't grant you the right to implement a program that can use the specifications. That's the purpose of the patent license.

This whole Patent License business is a bit troubling to me, as it starts off by saying "Microsoft may have patents and/or patent applications that are necessary for you to license in order to make, sell, or distribute software programs that read or write files that comply with the Microsoft specifications for the Office Schemas." It then goes on to say "Except as provided below, Microsoft hereby grants you a royalty-free license under Microsoft's Necessary Claims to make, use, sell, offer to sell, import, and otherwise distribute Licensed Implementations solely for the purpose of reading and writing files that comply with the Microsoft specifications for the Office Schemas." You need to display a license notice, you can't sublicense, and "You are not licensed to distribute a Licensed Implementation under license terms and conditions that prohibit the terms and conditions of this license."

Also, within the schema license itself you'll find this language: "No right to create modifications or derivatives of this Specification is granted herein."

So where's the problem? Well, first off, it seems possible that the bit about not being licensed to distribute under other license terms bit in the Patent License is a clause designed to prevent application that use the Gnu General Public License (GPL) from implementing Office XML compatibility. To be fair, Eben Moglen, the counsel to the Free Software Foundation (which keeps an eye on the GPL) says he doesn't think there's an incompatibility. But if I were writing open source software, I'd think twice before using these schemas.

Second, what the heck can you patent in an XML schema? XML Schema itself is, of course, an approved recommendation of the World Wide Web Consortium. It's all right out in the open and understood by thousands of applications. Patents are supposed to be for non-obvious innovations (though many recent abuses of the system make it all too clear that the US Patent and Trademark Office is simply incompetent to judge the merits of software patents), so what's non-obvious about using an XML schema to describe an XML document? I don't get it. A search at the Patent Office for "XML Schema AND Microsoft" turns up ten patents. None of them look especially applicable to me, but reading patents is a minefield for the layman so I could easily be wrong.

Finally -- and most troubling to me -- is this whole business of the license only granting you permission to "read and writes files that are fully compliant" with the specification, and not being able to create modifications or derivatives. Correct me if I'm wrong, but doesn't the X in XML stand for "Extensible" (we'll talk about the inability of developers to spell another day)? It seems like you have to implement the whole schema and nothing but the schema to avoid falling afoul of the license. That sure doesn't seem extensible to me.

Maybe I've just spent too much time hanging around with open source folks, and it's made me paranoid. Maybe the Office team always intended to make everything open, and just forgot until the Danish government reminded them. Maybe the developers at Microsoft want us to use the schema to build interoperable applications, and understand that we might extend or only use part of the schema. Maybe everything was fine until the lawyers got involved and marked everything up. But given Microsoft's past behavior, I certainly think these are questions worth pondering. Hopefully Microsoft will move forward on openness and interoperability, and the lawyers will get out of the way, and we can all write happy smiling software that plays well together.

About the Author

Mike Gunderloy has been developing software for a quarter-century now, and writing about it for nearly as long. He walked away from a .NET development career in 2006 and has been a happy Rails user ever since. Mike blogs at A Fresh Cup.