XML to XHTML rendition and stylesheets via
XSL•FO
by
RenderX
- author of
XML to PDF formatter. Originally used stylesheets by P.Mansfield.
|
Keywords: Documentation, Document creation, XML, Schema, Contracts, Authoring
The OASIS, Legal XML eContracts Technical Committee was constituted nearly three and a half years ago. The TC's mission is to promote “the efficient creation, maintenance, management, exchange and publication of contract documents and contract terms”. Since its formation, the TC has considered a wide range of use cases from members with very different perspectives on contract documents and contract transactions. The TC conducted a detailed requirements analysis and a review of available XML schema for narrative documents.
The TC identified four major domains in which narrative contact documentation is relevant to contracts. The TC has decided to produce a specification for an XML schema that is suited to the back-end processing of narrative contract terms and documents. This is intended to support knowledge management and automated document production by enterprises who use contract narrative terms library systems to produce documents for multiple contract transactions. The TC expects that front-end processes such as the drafting of original contract narrative terms for individual contracts will continue to use word processor based applications for the foreseeable future.
There is no generally available XML schema that is suitable for maintenance of contract narrative terms and documents in library based systems used by law firms and other enterprises. The TC's proposed specification and schema will meet this need. It will provide a schema designed for long term storage and automated processing of narrative contract terms and documents. It will also facilitate their publication in any desired output format. This will support web based contract creation systems and complement the development of word processing based XML formats such as the Microsoft Office Open XML format and the OASIS OpenDocument XML format. Enterprises that have hitherto used proprietary processing systems for back-end contract narrative terms library management that tie content to software will be able to use off-the-shelf, standards based XML processing applications. The TC expects this will reduce costs and enable improved contract document creation services.
In September 2005, the OASIS, LegalXML eContracts Technical Committee (TC) completed evaluation of a list of available document schema as candidate host schema for an eContracts Schema. The schema chosen was the Business Narrative Markup Language [BNML Schema] developed by Elkera [http://www.elkera.com], a company operated by the author.
This paper explains the objectives of the proposed schema and a high level view of its architecture. It does not describe the schema design in detail, mainly because element naming and some design concepts are still subject to change.
The eContracts TC's mission is to promote “the efficient creation, maintenance, management, exchange and publication of contract documents and contract terms”.
The TC's charter [eContracts TC charter] then provides:
“The core scope of this activity will be to the creation of DTD(s) / schema(s) that can be used by parties:
a. | Negotiating and finalizing
contracts in an application neutral format; | |
b. | Exchanging contract contents as valid
XML; | |
c. | Automating processing
of contract content, for example for use in contract management
applications; | |
d. | To support
the production of human readable output documents; and | |
e. | To facilitate the use of reusable or
boilerplate information within a
contract.” |
It is notable that the TC is not focussed on any particular vertical segment of the contracts domain.
The TC established its OASIS charter in July 2002. Its members included lawyers, software developers and researchers. Notwithstanding that a simple charter was defined fairly quickly, it soon became clear that most TC members had quite different aims in terms of the problems they wanted solved and their perceptions of the role that XML technologies may play in the solution. The TC experienced considerable difficulty defining the business problem or problems it was trying to solve and the technical approaches it should take to dealing with those problems.
In some respects this difficulty reflected the confusion that is evident in the marketplace generally about XML. There are many different XML schema presented as “XML”. Many of these schema have very different uses but the intended purpose and limits of each is often not easily appreciated. The effect is that many potential users are having difficulty defining their requirements and relating them to the different offerings under the XML banner.
The TC had to develop a clear language with which to define its requirements. Terms such as structure, semantics, metadata, machine readable all proved to be highly ambiguous. The TC found it needed to clearly define some basic concepts to avoid confusion among TC members. Key definitions included:
contract metadata | means information about a contract, particular contract terms or
embedded data values that is not part of the contract
narrative. | |
contract narrative, narrative | refers to the
terms of a contract expressed in natural language. | |
deontic contract language | means a language that can express the rights and obligations of parties to a contract
in a form that can be parsed by software applications and processed with other
data to determine state information about matters governed by the
contract. |
The term deontic contract language is novel. There are several research projects aiming to create machine readable contract expression languages are based on deontic logic [Stanford - deontic] a formal logic that seeks to express moral rights and obligations. The TC found the term “machine readable” unhelpful. every ASCII text character is machine readable. Metadata is machine readable. The TC found it needed a distinctive term to describe a specific form of machine readability that may be capable of supplanting the contract narrative in some situations.
A range of scenarios were presented by TC members to represent their interests in the TC's work. The main scenarios and the problems sought to be addressed are summarized in these Case summaries.
Buyers of many goods and services must accept contract terms shown online before they can complete their purchase. Many of these contracts have ongoing effect.
The stated problems:
a. | Party's transacting
online may not know what contracts they have entered
into; | |
b. | they will have
great difficulty determining their obligations under those
contracts; | |
c. | consumers
cannot easily compare terms offered by different
providers. |
Many contracts, such as construction contracts require parties to meet obligations and exercise rights at specified intervals or on the occurrence of events over a long period.
The stated problems:
a. | There
are frequent disputes over change authorisation in construction contracts and
similar transactions where frequent variations occur. | |
b. | It is difficult to ensure all parties have
reliable information about upcoming obligations under the
contract. | |
c. | It is
difficult to extract terms and embedded data values from the contract into
content management systems. | |
d. | It is difficult to access the content of external documents that
are incorporated into the contract. | |
e. | There is no reliable way to determine the state of contract
events, obligations and processes. | |
f. | It is difficult to monitor and analyse performance of parties over
extended time periods. |
Lawyers rely heavily on precedent documents and documents from previous transactions when preparing new documents. It is common for lawyers to use document assembly and variables substitution systems when creating new documents.
The stated problems:
a. | contract documents are created using word processing applications.
These documents can’t easily be processed at convenient levels of granularity
by automated systems. | |
b. | In addition to legal maintenance, precedent documents must be
revised to deal with changing file formats and proprietary processing systems.
This adds to the cost of maintenance. | |
c. | The absence of standard storage formats for
contract narrative terms and documents means that it is difficult to process
these documents outside the creating application. This creates proprietary ties
between data and software applications. It frustrates the use of off-the-shelf
tools, inhibits automated document creation, information reuse, information
extraction and change traceability. |
This case was really another approach to Case 2. Various models have been developed to define contract rights and obligations using formal languages that can be interpreted by computer systems. Machine readable contracts would be useful in various contract management contexts from performance monitoring to dispute resolution.
The stated problems:
a. | There is no standard deontic contract language
that can be used for a wide range of contracts. | |
b. | There is no way to manage the relationship
between deontic contract language and the natural language
terms. |
Contract negotiation is slow and expensive. The inefficiencies inherent in human contract negotiation limit the value of the transaction, particularly where rich parameter sets are involved. A program was developed by a member for computer negotiation of contracts with defined parameters.
The stated problem: There is no standard way to map a given set of negotiated contract parameters to a unique set of contract terms.
Contract users at various stages in the contract life cycle and in various transactions wish to be able to identify contract terms by their legal subject matter.
The stated problem: There is no standard framework for the description of contract terms using a taxonomy or other controlled vocabulary to support contract negotiation, document assembly and contract management.
Electronic commerce transactions are governed by a master contract. Current standards do not deal with the formation and management of these contract terms.
The stated problems:
a. | In electronic commerce
transactions there is currently no way to map or validate electronic
transactions against their master agreement. | |
b. | There may be no way to automatically determine if
there is a master agreement. | |
c. | Human negotiation of bi-lateral master agreements is too time consuming. |
Under its charter, the TC is concerned with contract documents. It had to consider the preparation and use of contract documents in various transaction contexts. The TC was not formed to address contract document issues in any particular vertical segment and it considered various analytical approaches as a means to develop a holistic context for its work.
During requirements analysis, the TC considered common contract transaction frameworks to gain a better understanding of the way in which documentation is prepared and used in those transactions. This allowed it to divide contract documentation into four domains relevant to the TC's charter. The TC concluded that the requirements for and the applicability of XML technologies may be quite different in each domain.
This document provides only a summary of the key conclusions reached by the TC. A detailed statement of the TC's requirements and the supporting analysis is set out in the TC's Requirements for Technical Specification [TC requirements].
This domain covers cases where contract narrative terms are prepared as a library of terms or draft contracts that may be incorporated into or used as a starting point for new contracts in multiple future transactions. The contract narrative terms may also include contract metadata. Often, the incorporation of these terms into new contract documents will involve the use of information retrieval and automated document assembly or document creation systems.
The back-end processing domain particularly covers:
a. | the preparation,
maintenance and use of precedent contract terms to create new contract
documents by law firms and enterprise legal departments or through automated
contract negotiation; | |
b. | the maintenance and use of standard terms in on-line business to
consumer transactions which require the purchaser to enter transaction details
and view and accept the contract terms on-line; | |
c. | the maintenance and use of standard terms to
generate new contract documents in highly standardized transactions where
assent may be manifested by signing a printed document. In these transactions,
transaction details may be entered via an electronic form or extracted from
data held in a database system. |
This domain is that involving the day to day drafting of original, contract narrative terms for specific transactions between identified parties, as undertaken by lawyers and others. It includes the processes of negotiation with other parties up to and including assent to the contract.
This domain is often built on the back-end narrative terms library domain. In law firms, the back-end processing systems exist to service this domain.
In this domain, contract narrative documents are prepared solely for the purpose of meeting the needs of the parties to particular contracts. These contracts may be completed quickly or they may govern the rights and obligations of the parties during a complex set of events over a long period.
This domain covers contract documents generated electronically, either where assent is manifested on-line in business to consumer transactions via a “clickwrap” mechanism or where assent occurs by signing a printed document as in many consumer finance and sale of goods transactions. This domain is really the front facing side of the back-end domain. It is included because it involves the gathering of specific transaction information needed to complete standardized contracts.
The TC examined business to business electronic commerce transactions. These exhibit some characteristics of forms based contracting but were considered outside the scope of the TC's work.
This domain covers the expression of contract terms using a deontic contract language.
The TC envisages that contracts may be prepared using a deontic contract language where the contract terms are highly standardized and where computer systems can monitor contract performance of subject matter of the contract. Examples may include service level agreements for technical systems.
Currently, there is no commercial implementation of this apart from the Contract Expression Language developed by the Content Reference Forum [CRF].
Other research efforts have produced proposals for more generally applicable deontic contract languages [Milosevic et al 2004] and [Berry and Milosevic 2005].
This domain covers the use of contract documentation after assent. Information must be derived from the contract documents to support contract management activities and dispute resolution. In this domain, documentation could exist in contract narrative form or in deontic contract language form, or in both forms.
Key characteristics of documents in this domain include:
a. | Contract narrative terms are usually maintained in back-end
systems for long periods of time during which processing systems may
change. | |
b. | Terms often must
be updated in response to changes in the law and other
factors. | |
c. | Persons wanting
to use the contract terms need subject information about the terms to
facilitate access and use. | |
d. | Narrative terms may need to be rendered in a wide variety of
output formats, including print, word processing formats and HTML for use in
front end domains. | |
e. | Narrative terms may need to be processed as complete contract
documents or as individual components that can be incorporated into other
documents. | |
f. | Enterprises
that maintain databases of contract terms and documents usually expend a great
deal of effort to provide for the accuracy and consistency of the stored terms
because of their central importance in effective service
delivery. |
The TC concluded that the use of an XML markup language to model narrative terms and documents used in back-end processing could be highly advantageous to facilitate all these needs.
The back-end narrative terms processing domain is relevant to all the other domains identified by the TC and is the most generally useful case identified by the TC. The value of the content justifies any additional work involved in adding structure and semantics.
Currently, persons involved in the current transaction terms drafting domain use word processing software to prepare contract documents. The TC concluded that there is little likelihood that law firms and other enterprises who create contract documents will materially change the tools they use in the foreseeable future. It is likely that their word processing tools will begin to use XML file formats such as the Microsoft Office Open XML format [Microsoft XML format] (formerly WordprocessingML) or OASIS OpenDocument [OASIS OpenDocument]. These are presentation based formats and provide limited benefits in the other domains identified by the TC.
The TC was not satisfied that persons negotiating contract terms will exchange XML documents in formats other than the word processing XML formats.
The TC considered whether it could develop a standard that would involve adding additional semantics to these word processor based XML formats but concluded that the value of this is unclear at this time. Persons involved in contract drafting have little time or interest in adding markup to their documents. There is no reason to expect them to create markup for use by another party. Even if in particular contexts they will do so, the TC could not decide whether it should work with the Microsoft Office Open XML format or OpenDocument. Based on recent actions by European governments and the US State of Massachusetts to promote or require the use of OpenDocument formats, there appears to be a looming schema war between these two models [Wikipedia “OpenDocument”].
The TC considered whether it should attempt to develop a metadata or embedded markup model for use with one or both the word processing XML formats. The TC could not identify commercially practicable requirements for a specification covering the contract drafting and negotiation domain.
The TC concluded that the use of a suitable XML format to model narrative contract terms and contract documents in the back-end processing domain would provide greater flexibility to users in the contract drafting domain. It will reduce the costs of maintaining precedent databases, promote the availability of automated document assembly systems and enable the production of contract documents in the most convenient format, regardless of commercial and technical developments in software used by word processor systems.
Form contracts involve the use of standard contract narrative terms that cannot be negotiated between transactions. The only variables are matters such as product description, quantity, price, delivery etc. In either an on-line environment or off-line, the consumer or an agent of the service provider enters transaction variables into a form to provide the additional information required for the complete contract.
In click through contracts, the contract documentation will consist of an invoice generated from the entered data and a set of standard terms. Commonly, these remain separate.
In off-line transactions, such as insurance and consumer finance, a document may be generated that combines the transaction data and the narrative terms.
The TC considered whether its specification should address issues relating to the formation of click through contracts and access to the terms of contract. The TC could not identify commercially practicable requirements for a specification covering this domain. A specification for the back-end processing domain is likely to be relevant to the maintenance and processing of the standard terms in forms based contracts.
The deontic contract language domain is closely aligned with the contract management domain because it represents a distinct approach to contract management. This is a highly specialized area that may require considerable further research before commercial adoption. The TC had some expertise in the area but felt that it required broader representation from potential implementers to take this forward.
Currently, information required by contract management systems must be manually extracted from the contract documentation and entered into a database system. In many standardized transactions, transaction data may be held in a database and used to generate standard form contract documents in the form contract domain.
The TC considered that contract management information could be extracted from information suitably defined by XML markup in narrative contract documents or from contracts expressed in deontic contract language.
The TC considered deontic contract language to be outside the scope of its initial specification.
The TC concluded that it could only address the contract management domain if there was a likelihood of narrative contract documents being available in a suitable XML format. For the reasons considered in connection with the contract drafting and negotiation domain, the TC considered that XML documents are not likely to be used as the formal artifact of contract terms, except in specialized situations. In negotiated contracts the TC did not identify any change from current practice where a print rendition is likely to be the formal contract artifact. The XML document from which that contract is generated may not contain all contract terms. Some terms may be altered by hand on the printed document before signature.
Even in highly standardized transactions, the TC was not satisfied on the information currently available that embedding markup for contract management purposes in narrative contract documents was likely to be of practical benefit. It is just as likely that contract management information will be maintained separately from the contract narrative. This might be achieved using a separate XML representation to facilitate communication between contract management database systems.
There is also the possibility of attaching semantic information to standard contract documents such as those used in on-line transactions, using RDF, OWL and similar approaches. The TC could not identify business requirements for development of a specification at this time.
While there may be opportunities for further developments in this domain in the future, the TC lacks the necessary representation to further develop it at this time.
The functional requirements developed by the TC can be summarized in a few points:
a. | Common content
objects such as paragraphs and clauses from which contract terms are built must
be identifiable by XML markup so they can be processed as distinct objects or
content chunks in document assembly and other processing
systems. | |
b. | Contract documents are created with other document types in law firms and other
enterprises. The schema design must define common content objects that can be
adopted by another standards body responsible for developing a schema
applicable to other legal documents that are commonly prepared by the same
people using the same systems as contract documents. | |
c. | The schema must provide a model for users to
add contract metadata and embedded data values to contract documents and to
distinct content objects defined by the schema. The schema must make provision
for common metadata fields required by document management, document assembly
and publishing applications such as: | |
d. |
The TC looked at existing XML schema and identified several different XML markup models. These include:
a. | Rich, presentation based schema such as the
Microsoft Office Open XML format and OASIS OpenDocument; | |
b. | Simple web presentation schema such as XHTML
1.0 [XHTML 1 schema]; | |
c. | Generic structural markup schema such as DocBook [DocBook schema],
DITA [DITA 1 schema], TEI [TEI schema], Elkera BNML and possibly XHTML 2.0
[XHTML 2 schema] that can be applied to a wide range of documents. |
Presentation based schema do not provide the hierarchical structure needed to meet the TC's requirements. The TC concluded that it should adopt a generic structural markup model for the back-end narrative terms domain. The schema should define the hierarchical structure of the contract document so that the major components of contracts, including parties, recitals, body, signatures, schedules, clauses and paragraphs can be identified for automated processing and information retrieval purposes.
It is not uncommon for contract documents to contain structures similar to the following example, often in the same document:
The TC concluded that each of the structures at the third level is structurally the same thing and should be represented in the same way in the hierarchical markup.
The TC's specification will address the back-end narrative terms library and document generation domain. It will provide a model that can be used by persons who prepare contract documentation for back end processing in most of the other domains identified by the TC.
The TC analysed each of the structural markup
schema and concluded that the Elkera BNML schema provided the closest fit to
its requirements. The TC hoped to find an existing schema with developed
applications that could be used with the TC's schema. The only other possible
candidate was DocBook. DocBook was developed for an entirely different purpose
and would require extensive change. The DocBook section
element has a required title which
precludes its use to represent all the third level structures in Example 1. It
also lacks other important features required for contracts markup. A new schema
loosely based on DocBook would create an illusion of compatibility but little
actual benefit to a user group that has no particular familiarity or experience
with DocBook.
The BNML Schema is a new schema with just a few supporting applications provided by Elkera. It was developed with most of the TC's requirements in mind and provides a complete model for structural, hierarchical markup of contract narrative terms and documents. It is possible that a number of changes will be made by the TC as the schema is adapted to the specific requirements of the TC. Whichever approach is taken by the TC, application development will have to start afresh.
The eContracts TC will produce a specification and XML schema to define simple patterns for the hierarchical or structural markup of the basic narrative structures found in contracts and most similar documents.
The proposed schema will provide these features:
a. | It will represent high level structures of narrative contract documents such as
clauses or sections, parties, recitals, signatures, schedules etc. | |
b. | It will provide maximum practicable separation of presentation from content to provide the
greatest flexibility in content re-use and production of user specific
renditions. | |
c. | It will be a
foundation schema that users can extend for their particular needs. It will not
be prescriptive or provide a smorgasbord of elements for all possible
situations. This means that it will not provide a comprehensive model for
exchange of XML documents between different users. The TC did not identify any
general need to exchange XML documents between back-end processing systems. The
standardization of the basic clause structures will provide the compatibility
required to enable processing systems from different vendors to be readily
adapted to most user implementations. | |
d. | The schema is designed to provide a simple and strict hierarchical
representation of common content objects found in contract narrative. It will
not require content authors to make difficult semantic distinctions in the
application of structural markup. There should be only one way to markup most
content. The simple markup will permit content components to be inserted at any
desired level of the document hierarchy. | |
e. | The simple, strict content models should be
easy to implement in XML authoring applications so that the schema can be used
in front-end XML drafting systems, if users wish to adopt a structural schema
as an alternative to the word processing XML formats. It should also provide a
favourable target for automated format transformations from word processing XML formats. | |
f. | The schema will provide for the incorporation of other vocabularies such as UBL and Dublin Core
where these overlap with contracts requirements. However, it will not require their use. |
The eContracts structural schema model will address the widest possible range of uses in back-end contract document processing systems. Based on this use case the TC aims to reduce the cost of maintenance of contracts documents in back-end processing systems and to promote the wider availability of automated document creation systems.
It is expected the eContracts schema will provide these benefits:
a. | It will provide a non proprietary, standards
based schema to facilitate the long term storage and maintenance of precedent
contract narrative terms and documents by law firms and other enterprises who
use library based systems for the preparation of narrative contract documents. | |
b. | It will allow off-the-shelf XML based content management and processing tools to be used for
the production, maintenance and use of precedent contract narrative terms and
documents. Currently, there is no generally available schema that is suitable for this purpose. | |
c. | It will enable high value contract terms to be transformed into any desired rendition
such as HTML, RTF, Microsoft Office Open XML format, OpenDocument or any other
format. Valuable resources can be managed without concern about shifting
fortunes of particular word processing formats. | |
d. | The eContracts schema will provide a
foundation that can be used to deal with common processing and publishing
requirements for contract narrative in all domains identified by the TC. |
The TC proposes to release its specification early in 2006. The TC aims to provide a generally useful schema that will address basic processing and publishing issues. It is hoped that this will provide a platform on which more specialized developments can be added. Future directions will depend on user feedback to the TC's initial specification.
Further development will also require the presence of stakeholders with direct commercial interests in the development of particular solutions.
The TC has taken the broadest possible approach to defining a specification for the XML markup of contract narrative terms and documents. The intention is that it may be used to improve the management of contract narrative terms and documents that have a high resource value because they can be re-used in many future transactions. This resource value justifies the extra effort that will be involved in adding XML markup, particularly contract semantics markup.
The two word processor based XML schema (Microsoft Office Open XML format and OpenDocument) cannot form a useful basis for the TC's work at this time. Firstly, these presentation oriented schema only provide very limited value in meeting the requirements defined by the TC. Secondly, it would be a mistake for the TC to try to pick a winner in the emerging schema wars. The TC's proposed specification meets the highest ideal of generic structural markup and allows users to transform to their desired output schema whether XHTML, XSL/FO, Microsoft Office Open XML format, OpenDocument or any other.
The TC has not selected any particular contract vertical market for its specification. It is hoped that its specification will be applicable to contract narrative markup in most industry sectors.
The proposed specification is a general tool that should permit users to adapt it to their own needs. There is no identified need for a universal contract data exchange model at this time. At this stage, the value will come from the emergence of a family of supporting tools that can process the basic markup structures in contract documents and be easily adapted to specific modifications required by user enterprises.
If the TC's initial specification is successful, further developments will likely depend on interested parties in particular vertical markets developing more specialized standards to meet their particular needs.
Peter Meyer is Managing director of Elkera Pty Limited, XML content management software developers and consultants located in Sydney, Australia. Peter is a former commercial lawyer who left legal practice 12 years ago with the motivation of developing a holistic knowledge management system for lawyers. Peter has worked with SGML & XML document markup applications for over 10 years. He is a regular presenter at XML conferences in Australia. Peter is an active member of the OASIS LegalXML eContracts Technical Committee.