A Generalized Grammar for Three-way XML Synchronization

Track: Core Technologies, Publishing, Knowledge Management

Audience Level: Technical view

Time: Thursday, November 17 14:45

Author: Robin La Fontaine, DeltaXML Ltd

Author: Nigel Whitaker, DeltaXML Ltd

Keywords: Change Management, Content Conversion, Interoperability, Merge, Synchronization, Translation


Synchronization problems, though diverse, can be categorized into distinct classes. For example, collaborative document editing often involves complex merging of multiple XML documents, with rule sets such as "mark conflicting edits for manual attention but accept all other edits" - a different rule set would typically be used in translation, synchronizing Original English, Modified English and Original Japanese texts. This processing can be described as a controlled merge of three data sets. In practice the execution of the merge, while falling into clear categories, is different in each case, leading to the need to express and formalize rule sets.

A naive approach describes a specific synchronization algorithm in terms of additions, deletions and conflict identification. However, such a fixed algorithm is inflexible and meets only limited needs, typically providing a solution for only one problem space. A more flexible method for describing and specifying how the synchronization is to be performed is required to allow us to categorize and solve sets of related problems.

This paper proposes a general synchronization grammar which can describe synchronization rule sets. For example, when handling three input files, we show that changes to elements can be described in terms of just seven possible permutations. Similarly, PCDATA and attribute changes can be described in terms of a fixed set of permutations. Using these permutations a grammar is proposed, allowing precise description of synchronization algorithms and rule sets and providing a testable framework for their implementation.

The paper applies the resulting grammar to existing synchronization tools and technologies and shows how the grammar can be applied to provide solutions for specific application areas, including document workflow and translation.

The paper will be of particular interest to architects, to project managers and to programmers working with complex documents and with workflows involving multiple concurrent changes.