Interoperability and the binary <> ODF conversion dilemma

There are two important issues to consider, especially wherever there are government pilot studies and ODF mandate proposals. The first is whether or not we can convert volumes of existing MSOffice binary documents to ODF, and do so with an acceptable fidelity loss. The second is whether or not ODF plug-ins for MSOffice can convert documents with sufficient fidelity such that MSOffice bound business processes can continue without costly disruption or re engineering.


For the sake of discussion, these issues are often referred to as “compatibility with existing documents”, and “interoperability with existing applications”. Or, as many do, the entire issue is lumped together as “interoperability”.


ODF must pass this real world test to be considered “implementable” or have any hope of success in the marketplace. Unfortunately, for much of the world isn't a clean slate implementation. Conversion of binary to ODF is a fact of life that can't be avoided.


Fortunately the reverse engineering and conversion of the MS binary formats is a sector as rich with success as it is rife with MS inspired turmoil, confusion, and obfuscation. Yes, the binaries have been a moving target. But the conversion sector can routinely hit an 85-95% fidelity and higher on import. Export however is a far more difficult issue.


Given the importance of being able to convert MS binaries to ODF, one would think that the OASIS ODF TC would do whatever it takes to improve conversion fidelity for both import and export. This sadly is not the case. Over the past five years some of the world's foremost conversion – reverse engineering experts have worked on the ODF TC, but have seen their efforts defeated or pushed out to future generations of ODF.


Two names in particular stand out; Phil Boutros and Florian Reuter. Phil was responsible for the outstanding Stellent conversion filters, considered by most to be the best in the business. Florian wrote the da Vinci ODF plug-in for MSOffice, and, the OOXML conversion plug-in for Novell's OpenOffice. Florian also represented Novell on the OASIS ODF TC, the Ecma 376 Workgroup, the Cleverage open source “Translator” project, and the EU-ISO authorized DIN Workgroup “Harmonization” study. Prior to his work with Novell and the OpenDocument Foundation, Florian was Sun's resident RTF – MS Binary conversion expert tasked with assisting and advising OpenOffice/StarOffice community developers.


That the many conversion sector inspired efforts to improve “compatibility-interoperability” within the OASIS ODF TC were defeated or kicked forward to future generations of ODF is a matter of record. But here's something else to consider. Microsoft is claiming that Sun and IBM have had access to the binary blueprints since 2003. I personally don't know if that's true. But what i do know is that Sun and IBM have had the binary blueprints since early 2006. And i do know that in the past two years, neither Sun or IBM have introduced any proposals to enhance or improve ODF interoperability with MSOffice, or compatibility with the billions of binary documents the marketplace seeks to convert to ODF.


So much for the value of the binary blueprints.


If you want to truly understand this difficult "compatibility-interoperability” issue, the best place to look is at independent conversion efforts. Sadly, all you'll ever get from the application vendors is a lot of useless finger pointing, heated politics, and refusal to compromise or cooperate. They have a different agenda than that of interoperability with the enemies product line.


And what do the independent conversion efforts tell us? They scream loudly that there is a fundamental difference between how OpenOffice and MSOffice implement basic document structures such as lists, tables, fields, sections and page dynamics. These basic layout engine differences are further complicated by differences in feature sets and business process - development environments.


After five years of working on these problems, i'm not hopeful this could ever be sorted out. After a full year of study, the DIN Workgroup recently notified ISO JTC-1 that harmonization of ODF and OOXML is impossibly difficult. It's a preliminary opinion needed for Geneva BRM discussions, but the full report is expected to be filed with ISO some time within the next six months. Hopefully it will be made public. I also suspect that the preliminary report has something to do with ISO ODF editor Patrick Durusau's recent statements that so shocked the ODF community. I've known Patrick since work on OASIS ODF began in 2002, and he is definitely someone who believed that one to one mapping was possible between ODF and OOXML. Out of the blue he suddenly switches to this “dual channel” but cooperative development idea. And for sure, he knows about the DIN Workgroup report. In fact, Patrick was a guest speaker, along with reps from Sun, IBM and Microsoft, at the EU-IDABC sponsored Berlin ODEF Workshop where the DIN Workgroup “harmonization” study was first commissioned.


As a bit of background, ODEF stands for “Open Document Exchange Format”; the Open Document name having been first used by the EU-IDABC as far back as 1985 when they came up with ODA, the “Open Document Architecture”. ODA was unable to compete with SGML, which went on to spawn HTML, CSS, (X)HTML, and XML subsets.


While it's true that IBM and Sun declined to cooperate with the DIN Workgroup, leaving Microsoft and Novell as the only application vendor participants of marketshare influence, the findings are nevertheless consistent with everything the independent conversion – translator sector has found. One to one mapping, harmonization, and/or convergence of the two file formats is impossibly difficult, and would demand a harmonization of the originating applications, MSOffice and OpenOffice.


IMHO, the real problem is that both ODF and OOXML began life as an XML encoding of the originating application's binary dump. Neither started life as a clean slate, generic, document structure focused format. Neither started life as an application-platform-vendor independent effort. OOXML is a black hole leading only to a MS Stack of application bound development and cloud services. ODF will likely continue to struggle with interoperability for as long as OpenOffice source code based vendors drive the OASIS ODF bus, insisting that all interop enhancements proposals be implementable by OpenOffice before they will agree on inclusion in ODF.


I say “likely” and hesitate here because Novell Office recently broke from OpenOffice ODF and implemented a much enhanced fields model – application feature, greatly extending the ODF fields specification. The new fields model enables advanced formatting of field content when merging data into an ODF document, and is very compatible with how MSOffice implements fields. But very much incompatible with the OpenOffice method. This is also a case where end user demands for MSOffice comparative functionality trumps the interop concerns of an OpenOffice specific ODF. How the OASIS ODF TC handles this gapping hole remains to be seen, but as far as Novell Office – SAP users are concerned, the new field functionality ranks as “must have”.


One thing we would all agree on is that the purpose of ISO/IEC standardization is <b>interoperability</b>. Document format standardization success must be measured in terms of “interchange”, the ability of many applications to exchange documents without loss of fidelity, content or data.


While we should applaud the heroic efforts to fully document and genercize OpenOffice ODF, the best we can say is that as an ”interchange” format, ODF is a work in progress, as demonstrated by the fateful Barcelona ODF Interoperability Workshop. ODF 1.0 – ISO 26300 in particular is woefully undocumented in three important areas; numbered lists, formulas, and the presentation layer.


Much of the work on ODF over the past five years has been that of first documenting OpenOffice ODF, and then “genericizing” the implementation model for document structures. To demonstrate how difficult this process is, consider that KOffice has been a participating member of the OASIS ODF TC for the past five years, and they still can't exchange documents with OpenOffice. The loss of fidelity is unacceptable by any measure.


At the 2007 OpenOffice Conference in Barcelona, Spain, the interoperability problems of ODF were put on public display with Lotus Symphony (OOo 1.1.4), OpenOffice, Novell Office, Google Docs, and KOffice developers demonstrating that only the simplest of documents can be exchanged. Government consultants in attendance were joking that ODF “interchange” was limited to documents with the equivalence of HTML 2.0 formatting! ODF interop was so embarrassingly lacking that IBM's Doug Heintzman used the occasion of a ComputerWorld interview to issue what many see as an apology. In the article Can IBM save OpenOffice.org from itself?, written in the immediate aftermath of the Barcelona ODF “Interoperability Workshop”, Heintzman said, “I hope the story coming out of Barcelona isn't a dysfunctional community story, but rather a [story about a] potentially significant and meaningful community with considerable potential that has lots of room for improvement....".


Since OOXML makes no bones about the fact that it's defining purpose is that of representing MSOffice specific documents in XML, ISO has no business considering OOXML as an “interchange” candidate. The problem is that ISO approved an OpenOffice specific version of ODF, albeit with the qualification that ODF be brought into conformance with ISO "interoperability requirements”. A May 2006 ISO Directive was issued to effect this. Meanwhile, Microsoft seized the opportunity to push through their own incredibly application specific format under the guise that ISO/IEC could similarly fix it to meet “interchange” requirements.


Conversion fidelity problems go back to the root problem of the originating applications having different and often irreconcilable layout models. While OpenOffice ODF and MSOffice OOXML both do a fine job of separating content and presentation, it is the presentation layers that stubbornly remain application specific and entirely reflective of the originating applications feature sets and layout models. You can't harmonize the formats without also harmonizing the originating applications!!!!!!


There is however a way out of this difficulty. Although it's impossible to perfect an acceptable conversion fidelity between two application specific formats, it is possible to hit a very high conversion fidelity between an application specific format and a generic format designed for highly interoperable “interchange”. This is the approach the OpenDocument Foundation finally settled on. The problem was finding a generic format capable of holding the application specific richness of both MSOffice and OpenOffice.


After much study and testing, we believe that the W3C family of (X)HTML – XML technologies has progressed to the point of being able to fully represent documents used by five important domains: desktop productivity environments, enterprise publication, content, and archive management systems, SOA, SaaS, and Web 2.0 – Cloud collaborative computing. Recently the W3C provided CDF -WiCD as a profile based means of strapping together the sprawl of (X)HTML, CSS, SVG, XForms, XSL, XSL-FO and XML technologies with the intent of preserving a very high level of interoperability.


One area where both ODF and OOXML fail miserably is that of having a defined ”interoperability framework”. The W3C's CDF stands in stark contrast in that the starting point for CDF was a well defined interoperability framework with compliance test suite. W3C technologies are amazingly application-platform-vendor independent from day one.


While we might not be able to effectively do direct desktop application to desktop application document exchange with acceptable fidelity, we can see a time when the existing plug-in architectures of both MSOffice and OpenOffice can be exploited with W3C CDF oriented “import-export” converters. At the higher level of the Web, the universal interoperability we all seek is actually possible, even with legacy desktop office suites at the editor-interface-workflow end.


One final point. WordPerfect Office has had the ability to do high level (X)HTML – CSS conversions for the past three years. The recent WPO beta adds import conversion of both ODF and OOXML. Ever wonder why OpenOffice doesn't similarly support (X)HTML-CSS?


Hint hint; CSS is a wonderfully portable, highly interoperable presentation layer that originated with HTML and has progressed to very rich and all encompassing CSS 3.0 recommendation.


To answer that question, you have to go back to a time before the open sourcing of StarOffice, when Sun was considering writing a StarOffice browser. This would of course require an (X)HTML-CSS processing capability. And wouldn't it be nice if StarOffice could provide browser ready – web ready documents? It was here that the OpenOffice/StarOffice developers came to an important juncture. The cost analysis of adjusting the StarOffice layout engine to produce interoperable (X)HTML-CSS was untenable. Besides, just prior to their 1999 purchase of StarOffice, Sun had joined AOL in the purchase of Netscape (November of 1998).


So the decision was made to drop (X)HTML-CSS and any related browser plans, and instead focus on an XML encoding of the StarOffice binary document representation. Although Sun had decided to reuse W3C technologies wherever possible, for the XML presentation layer they decided on using the proprietary and application specific “automatic-styles”, instead of CSS.


Rather than absorbing the cost of re writing the internal OpenOffice/StarOffice legacy layout engine, Sun has time and time again turned to XSL Transformations as the primary implementation means behind what turns out to be a rather shallow reuse of existing W3C technologies. In most cases with ODF, W3C namespaces are themselves high-jacked and used to fully constrain the implementation model reflecting OpenOffice limitations, feature sets, and application specific preferences.


The true cost of these accumulated decisions is that ODF documents are not Web ready or easily convertable.


Of course, the same can be said of OOXML, but with this important exception; Microsoft has a web ready transition plan. But it's not a W3C – Firefox – Apache Tomcat compatible plan. In fact, the lack of W3C technologies is frighteningly indicative of what Microsoft has in store for the Internet. Rather than “embrace and extend”, we're looking at a wholesale, across the boards replacement of W3C technologies with Microsoft proprietary methods and protocols.


The recently released Microsoft Office SDK offers us a good look at what the grand strategy behind OOXML is all about. In the SDK we see a component for the fluid conversion of OOXML to something called “fixed/flow”; providing what may be the cornerstone for connecting the monopoly base of some 500 million MSOffice desktops to a Microsoft only collaborative computing cloud. A cloud where Microsoft proprietary protocols, formats, components and methods replace those used by the W3C – Firefox - Apache triangle that sits as the foundation of the Google, Yahoo, Oracle, IBM and Amazon cloud initiatives.


The first law of the Internet is that interoperability trumps all other concerns. Including application-platform-vendor specific innovation. This law is reversed however, and stood on it's head when it comes to the MS cloud.


The “fixed/flow” static-interactive document component seems to be a direct replacement of PDF - (X)HTML+CSS. My guess is that Microsoft is pulling out all the stops for ISO/IEC approval of OOXML because the success of their cloud initiative depends on controlling the transition from MSOffice as the primary client/server editor-interface, to that of MSOffice as the primary MS cloud editor-interface. ISO/IEC approval of OOXML establishes MSOffice as a standards compliant editor-interface into cloud hosted information systems.


By the time the world gets to complaining about fixed/flow, XAML, silverlight, the .NET libraries, C# and the rest, the damage will have been done. The monopoly base will have been transitioned and reconnected. Web interoperability will very much depend on access to MS only technologies. The Google cloud will remain trapped inside the browser. The MS cloud will fully exploit the desktop productivity environment.


ISO/IEC approval of OOXML is how Microsoft extends and leverages their current monopoly into the next generation of web centric cloud computing. This is not a good thing for the future of the Internet as universal interoperability platform used by all, owned by none.


Hope this helps,

~ge~