XLinq: XML Programming Refactored (The Return Of The Monoids)

Track: Core Technologies, End-User Applications, Case Studies

Audience Level: Technical view

Time: Wednesday, November 16 14:00

Author: Erik Meijer, Microsoft Corporation

Author: Brian Beckman, Microsoft Corporation

Keywords: XML, XQuery, Visual Basic, C#, Monoids

Abstract:

New programming languages such as XQuery , XJ , and Xtatic are changing the landscape of XML programming by providing deep language support for querying and constructing XML. By carefully factoring the fundamental principles underlying these languages, we can both reapply XML query technologies to other data models, and simplify construction and manipulation of XML values in languages that do not natively support XML.

Most query languages, including XQuery and SQL, can be factored into (1) general operations on collections, and (2) domain-specific operations on the elements of these collections. Examples of general operations on collections, are mapping functions over collections (projection), filtering (removing) elements from collections, grouping (partitioning) collections, sorting collections, and aggregating (folding) other operations over collections to reduce them to values.

Examples of domain-specific operations on elements of XML collections include selecting down the structure axes (children, attributes, descendants, siblings, etc.), and constructing new elements and attributes and attaching them under existing XML nodes.

Database and language researchers for the likes of Haskell , Scala , Python , Comega , long ago discovered that generic operations on collections are all instances of the concept of monoids or monads [Monads]. Considered as such, they satisfy many coherent algebraic properties and allow elegant syntactic sugar in the form of query comprehensions. For example, FLOWR expressions in XQuery are a form of query comprehension and are not in any way specialized to operating over collections of XML nodes. On the contrary, FLOWR expressions would be equally useful both in ordinary programming for complex queries over collections of objects and in relational database systems for queries over tables of rows.

We propose in this paper that general-purpose languages should extend their query capabilities based on the theory and established practice of monads, allowing programmers to query any form of data, using a broad notion of collection. This proposal is in opposition to the common practice of inserting specialized sub-languages and APIs for querying XML one way, databases another way, and objects a third way. The LINQ project at Microsoft leads by example, providing basic monadic operations on collections of arbitrary origin, on top of which C# and Visual Basic programming languages have implemented generic query comprehension syntaxes on top of LINQ.

With respect to domain-specific XML operations, we notice that current APIs -such as the DOM- are very imperative, highly complex, and irregular when compared to (1) the regularity of the XPath axis view and (2) expression-oriented element and attribute construction as offered by XML-centric languages. The current DOM is also rather heavyweight. We propose an alternative, lightweight, rational, and simple API, called XLinq, designed specifically to mesh with the general query infrastructure of LINQ. XLinq is expression-oriented rather than imperatively statement-oriented like the DOM, and supports node-centric rather than document-centric creation of XML, allowing the structure of code to mirror the structure of data.