How schema-validity is different from being married

Track: Core Technologies, Deploying XML

Audience Level: High Level/Technical view

Time: Wednesday, November 16 14:00

Author: C. M. Sperberg-McQueen, W3C

Keywords: W3C XML Schema

Abstract:

Validation with some schema languages (e.g. XML 1.0 and SGML DTDs) is a black-or-white question: either the document is (wholly) valid or it's not valid (at all). There are no gray areas: a document cannot be mostly valid any more than one can be a little bit married. The theoretical information content of a validation result in these systems is thus exactly one bit: yes/valid or no/invalid. In practice good validators try to provide a little more information about errors, but quality of error diagnostics varies and error handling is not standardized. In other languages (e.g. XML Schema), validity assessment is designed to provide more than one binary digit of useful information. XML Schema allows various forms of partial validation: Validation can start at an element other than the root. Wildcards can specify that elements they match should not be validated but skipped (aka black-box processing — the data must be well-formed XML, but don't look inside). And wildcards can specify 'lax' validation (matching elements must be well-formed XML, and if the schema has declarations for them, they'll be validated, but the absence of declarations doesn't make the container invalid).

In XML Schema, schema-validity is captured by three properties: The first is [validation attempted]: Did we try to validate this item? Its values are full, none, partial. The second is [validity]: Is the element valid? It takes the values valid, invalid, notKnown. The third is [schema error code]: If the item is not valid, then a list of error codes (references to XML Schema validation rules) explaining why.

This paper will talk about why it can be dangerous and unhelpful to reduce validity to a single bit of information, and how it can be helpful to take a richer view of validity as a property with several values, a property not just of the document as a whole but of each element and attribute in the document.