XML to XHTML rendition and stylesheets via
XSL•FO
by
RenderX
- author of
XML to PDF formatter. Originally used stylesheets by P.Mansfield.
|
Keywords: content management, content, repurposing, metadata, semantic web, web services, RDF, RSS, remixing
RSS (Really Simple Syndication; there are other acronyms but this is my personal favourite) was developed in the 1990s to enable automatic web surfing. An RSS 'feed' is simply an XML file using a very simple XML format with the latest updates to a site. Users can subscribe to an RSS feed using an RSS reader and be automatically kept apprised of that site's updates without manually having to check the website.
RSS was popularized by blogs, but it has been adopted by mainstream media sites like the New York Times, BBC, the Economist etc., by search engines like Yahoo, Google, etc., by RSS search engines like PubSub, Feedster, Technorati, etc. and is being built (or is already built) into popular software and operating systems like Microsoft Office, Windows Vista, Mac OS X, Safari, Firefox, etc.
The widespread adoption of RSS has enabled a user generated content revolution. How? Since RSS is XML, it is machine readable which enables near real time search engine indexing. This allows users to have almost real time distributed conversations on the web. The early adopters and VERY early majority on the web are now publishing their content using RSS to take part in this conversation.
A large part of this conversation is through "RSS Remixing". In its crudest form, RSS Remixing is bloggers subscribing to an RSS feed, and remixing it by quoting part of the text, adding some commentary and then publishing this on their blog. This updates their RSS feed which others can remix as well.
This has been done manually by users and in code by developers since the early days of RSS and is now being done increasingly automatically using tools such as Drupal (an open source content management system), Radio UserLand, FeedBurner and many, many other tools.
These tools enable any power user and early adopter to be an RSS remixer without any software or XML knowledge. Previously, this power was only available to developers and XML experts!
This high level presentation will briefly introduce RSS and blogs and then illustrate with examples how users remix RSS today and in the past and also discuss the implications for knowledge management and sharing in a world where this kind of re-mixing is ubiquitous and where RSS is mainstream (as it will be post 2007 when it is part of Windows Vista) and eventually becomes transparent.
RSS Remixing started with the human global online conversation. Therefore, this paper will begin with an introduction to this conversation and then discuss the past, present and future of RSS remixing.
RSS Really Simple Syndication is a two part [1] indication to subscribers and search engines that a website (most commonly a blog but more and more websites and in the future anything on the internet that has updates):
1. | An indication that the site has updated. This is called a 'ping'.
| |
2. | A list of updates to the site. This is in XML format and is also
called an RSS "feed". |
RSS is an XML file format. There are other acronyms but Real Simple Syndication is my favourite.
RSS is a very simple XML format which has three popular variants:
Currently, RSS 2.0 is the most widely used, but all the variants are equivalent from the point of view of a "user remixing RSS" . This is unwieldy so henceforth they will be referred to as "remixers".
Here's an example of an RSS 2.0 file modified to show two items from the excellent [Hammersley2005], Developing Feeds with RSS and Atom. It demonstrates how simple RSS is.
Example 1. Barebones RSS 2.0 Feed Example
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>A Very Simple Feed/title> <link>http://example.org/index.html</link> <description>A Very Simple RSS 2.0 Feed</description> <item> <pubdate>Mon, 03 Jan 2002 0:00:02 GMT</pubdate> <description>Roland cool description for item 1</description> <author>Roland Tanglao</author> </item> <item> <pubdate>Mon, 03 Jan 2002 0:00:01 GMT</pubdate> <description>Jim cool description 2</description> <author>Jim Smith</author> </item> </channel> </rss>
The example omits the enclosure element which is used for podcasting. This subelement of the item tag is a link to an audio, video or other multimedia resource and is used by podcasting clients to automatically download audio to an MP3 player.
The most important thing to note here from a remixer point of view is that the description element which is a sub element of the item tag and contains the content of a blog post, is an unstructured "blob of text" The text may be HTML but that is all one knows. The HTML is not in any a priori known structure.
This is one of the great strengths and weaknesses of RSS. It is a strength because you can put anything in there and it is very simple to generate and can therefore even be hand-coded. It is a weakness because this means that software cannot assume or derive anything about the blob of text.
Attempts are being made to "fix" this via microformats which is also known as structured blogging. More on this in Section 5.1, “Better Data: Microformats”.
A ping is an indication to search engines and other services that a site has updated. It started off as a standalone message just to indicate that a website has updated but now it is usually generated automatically together with an updated RSS feed.
Pings are implemented using a very simple [XML-RPC] message as documented in the [weblogs.com update notification spec]
Figure 1. How the Global Online Conversation Works
Refering to How the Global Online Conversation Works figure:
For the purposes of this example, "Joe" and "Jane" are two bloggers.
1. Joe writes something and publishes it to his website with RSS. It's probably a blog but it could be any site like the BBC or the New York Times that publishes RSS
2. Joe's system updates his site's HTML, updates his RSS file and sends a 'ping' message to the 'Aggregation Ping Server' indicating that his site has updated.
3. Search engines like Google and RSS specific services like Feedster, Technorati and PubSub periodically ask the Aggregation Ping Server, "Which sites have updated?".
4. Since Joe's site sends pings and has an RSS file and is easy to update frequently, and is updated frequently, Joe's site gets re-indexed quicker and Joe's search engine rank is higher than a 'normal site' without RSS
5A. Teresa uses a program called an RSS reader (e.g. FeedDemon [http://www.feeddemon.com/] on Windows, NetNewsWire [http://ranchero.com/netnewswire/] on Mac, or an online web application like Bloglines [http://bloglines.com/]) to subscribe to Joe's site. The RSS reader (also called an RSS aggregator) checks Joe's RSS file for updates periodically (usually once/hour or once per day) and notifies her of Joe's updates. Teresa no longer wastes time manually surfing Joe's site. She just checks her RSS reader. As a result, Teresa's information flow is more efficient and she can monitor more sites in less time.
OR
5B Teresa either does not use an RSS reader or does not subscribe to Joe. She finds his stuff through a search engine. Joe's stuff is easier to find because his search engine ranking is higher because his site has RSS.
6. Teresa disagrees with Joe and posts it to her blog.
7. Teresa's system updates her site's HTML, updates her RSS file and sends a 'ping' message to the 'Aggregation Ping Server' indicating that her site has updated.
The above seven step cycle can now repeat with Joe playing the role of Teresa and vice versa. And also it is not confined to Joe or Teresa. It can be ANYBODY on the internet. And a blog is not needed to participate. All that is required is something that publishes RSS to the internet. It can be something as rudimentary as a social bookmarking service like del.icio.us [http://del.icio.us/].
Voila! Near real time global, distributed, conversations between two or more people as shown in the Simplified Global Conversations Diagram
In this paper, RSS Remixing is defined as taking an RSS feed in, processing it somehow (manually or programatically) and producing an RSS feed out. Users want to do this to extract the knowledge they want out of the Global RSS Conversational 'noise' using search, analytics, etc.
Of course, developers have always done this and will continue to do this by creating software. This paper focuses on remixing without writing code and without using tools that require detailed XML domain knowledge like XSLT.
In the early days (around 1999-2001), the original remixers remixed manually. They cut and pasted HTML code from their browsers into their blogging tool and added their annotations and blogged it back into the global conversation. This worked well when there only a few thousand people in the global RSS conversation and one only paid attention to a few.
Developers and alpha geeks were the first to take this to an extreme and to start monitoring many, many conversations. It did not take long for developers to come up with tools to facilitate this and remove the manual tedium. These tools included RSS aggregators (the author prefers the term RSS reader) which automatically downloaded the new information (using RSS) from the sites you subscribed to.
One of the first was Radio UserLand [http://radio.userland.com/]. It was a combined RSS reader and blogging system that automated what people were doing manually. From the reader, a remixer could remix (or "re-blog") an item from one their RSS subscriptions. With Radio UserLand (see the Radio UserLand figure), the process was simple. If one wanted to re-blog, simply click on "POST" next to the item, add your commentary and click "POST to Weblog". Two clicks. Simple and very effective remixing.
Figure 3. The Radio UserLand Aggregator - early re-blogging example
For the author, the remixing present started with the advent of RSS search engines (Feedster was one of the first in early 2003) and ended when Google introduced their RSS search engine (called Google Blog Search [http://www.google.com/blogsearch]) in September 2005.
In this era, RSS moved from being just 1000s of blogs to being millions of blogs and not just blogs, also 1000s of newspapers, magazines and other more established online media outlets. No one knows why this RSS explosion started, but here are two major reasons:
• | Established media and people noticed how effective RSS was after September 11,
2001 in bringing the conversation out and giving new perspectives on the news. | |
• | Websites with RSS that update frequently have a higher search engine rank then websites without RSS. |
The RSS Search Engines (Feedster [http://feedster.com/], Technorati [http://www.technorati.com/], PubSub [http://www.pubsub.com/], Blogdigger [http://www/logdigger.com/] et al were a remixer's dream. Born out of a need to let people find what they were looking for in an enormous number of online global RSS powered conversations, they allowed users to specify what they want and more importantly set up an RSS feed of their search.
With RSS Search engines, there was no need to search or surf the global conversation manually. Remixers set up RSS search engine feeds for the keywords, organizations, topics they were interested in and the news came to them. For more information on this refer to [How to be a NewsMaster] (the term [RSS NewsMaster] was invented by Robin Good [http://www.masternewmedia.org/]to describe professional RSS remixers which the author believes will become part of the job description of 21st century librarians). the author's primer for remixers on how to set up RSS search feed on two of the leading RSS search engines: PubSub and Feedster
Here is an example Feedster search for XML Conference 2005 . Note the orange XML badge at the top right next to "Page 1 of 6". This is badge is a link to an RSS feed which remixers would subscribe to in their RSS reader.
Figure 4. RSS Search Engine Example - Feedster search for XML Conference Atlanta
There were other developments in this era as well in the areas of re-blogging and RSS feed slicing and dicing and most importantly tagging.
Re-blogging increasingly became a normal feature of remixer tools both in desktop applications (e.g. on Mac OS X: ecto [http://ecto.kung-foo.tv/] and MarsEdit [http://ranchero.com/marsedit/] and on Windows: Qumana [http://qumana.com/] and BlogJet [http://blogjet.com/]) and web applications (e.g. Drupal [http://drupal.org], an open source CMS Content Management System). Drupal's reblogging feature can be seen in this screenshot from UrbanVancouver.com. The little "b" buttons function like the Radio UserLand "POST" buttons.
Figure 5. Drupal Aggregator - re-blogging example
RSS feed slicing, dicing, intersecting etc. or what Seb Paquet calls [Amateur RSS Bricolage] became available to remixers through web applications like FeedBurner [http://www.feedburner.com/], Blogdigger [http://www.blogdigger.com/] and the recently introduced FeedShake [http://www.feedshake.com/]
In this FeedBurner example you can see the combining of a personal blog feed (the author's blog) with the author's RSS feed of photos from flickr, an online social photo blogging service into one feed with both blog posts and photos.
Figure 6. FeedBurner - example of splicing two feeds together
The final RSS bricolage example in this paper is a Blogdigger group which allows you to combine any arbitrary feeds; not just your own feeds as in the FeedBurner example. A more complete list of RSS bricolage tools can be found in [Library clips - RSS Bricolage Tools List].
Figure 7. Blogdigger - example of splicing arbitrary feeds together
Finally, the last and probably the most significant development of the present era is that of tags.
Tags are the remixer's equivalent of keywords that were pioneered by Joshua Schacter's online social bookmark service, del.icio.us (and similar services like Furl [http://www.furl.net/], Spurl [http://www.spurl.net/], etc.). Tags are simply one or more keywords that remixers add to describe bookmarks. These bookmarks are stored on the web (not the user's local hard disk) and are by default available to all which is why del.icio.us has been called a social bookmark service. Every tag has an associated RSS feed. Most importantly, remixers can mix and match tags' feeds to do RSS feed intersection. For example, as seen in the del.icio.us example figure, one can get an RSS feed for anything tagged with both "Atlanta" and "food" say if one were travelling to Atlanta for a conference.
Figure 8. del.icio.us - an example of using tags to splice the intersection of feeds together
Tags have their problems (e.g. spam tags and disambiguation of tags with multiple meanings e.g. maple the software [http://www.maplesoft.com/] versus maple trees) . But they have a very low barrier to entry because to tag, there is no need for analysis or commentary as in blogs and tags allow remixers to easily bring the non RSS part of the web into the RSS global conversation with very little effort. A more complete treatment of tags can be found in [Powerbloggers turning to tags].
WARNING: The folowing is the educated guesswork of somebody who has been blogging since 1999 and obsessed with RSS since 2001.
The future of remixing is now, or, to be more accurate, has already begun. In addition to that which is impossible to predict today, it will consist of tagging (as popularized by del.icio.us, flickr [http://flickr.com/] et al) which began in the previous era for a few and will continue to expand in the future for the masses and also:
[Microformats] from an RSS perspective are an attempt to make the "blob of text" in the RSS description field into something structured in a lightweight, human readable (in the same way that HTML is human readable) AND machine parseable representation using XHTML. Microformats build on the foundation of earlier efforts such as [Joe Reger's data blogging] and the [RVW RSS Extensions for Reviews].
Imagine if people could write restaurant reviews (refer to the example below of a restaurant review in the hreview microformat taken from the [hreview page on the Microformats wiki]) and those reviews were carried in a review microformat as part of an RSS item. Then it would be very easy to write tools that would enable remixers to aggregate, splice and dice the microformats that they care about (e.g. recipes, reviews, and events). This is the world enabled by the ubiquitous use of standard microformats with RSS.
Example 2. Restaurant review example in the hreview Microformat
<div class="hreview"> <span><span class="rating">5</span> out of 5 stars</span> <h4 class="summary"><span class="item fn">Crepes on Cole</span> is awesome</h4> <span>Reviewer: <span class="reviewer fn">Tantek</span> - <abbr class="dtreviewed" title="20050418T2300-0700">April 18, 2005</abbr> </span> <blockquote class="description"><p> Crepes on Cole is one of the best little creperies in San Francisco. Excellent food and service. Plenty of tables in a variety of sizes for parties large and small. Window seating makes for excellent people watching to/from the N-Judah which stops right outside. I've had many fun social gatherings here, as well as gotten plenty of work done thanks to neighborhood WiFi. </p></blockquote> <p>Visit date: <span>April 2005</span></p> <p>Food eaten: <span>Florentine crepe</span></p> </div>
Remixers will benefit from prior art in artificial intelligence to use in analyzing the RSS "blobs of text" for filtering and other purposes. For example, the recently launched SearchFox [http://rss.searchfox.com/] uses machine learning algorithms. This trend will continue.
Another sort of prior art that remixers will benefit from is voice recognition which will be incorporated into remixer services for filtering of software generated transcripts of spoken word podcasts and videoblogs.
Developers have high quality open source software libraries for generating and building RSS feeds such as [Mark Pilgrim's Universal Feed Parser for Python] and the [Magpie RSS Library for PHP] These libraries make RSS accessible to developers without XML domain knowledge (the only knowledge required is how to use arrays, which make [RSS like a poor man's API]).
To date, however, there are no similar libraries for RSS remixing. Obviously, the RSS search engines have libraries that remix internally. In the future, open source libraries from the RSS search engines and others will emerge to make the building of software and services for remixers much easier.
Apple has already incorporated RSS into its web browser, Safari, on Mac OS X. Perhaps Apple will release an open source 'RSSKit' as they have done with the 'WebKit' which is part of Safari. RSS Support will be part of all operating systems in the future instead of being 3rd party add on libraries as it is today. At the time of the writing of this paper (September 2005), [Microsoft announced plans to build APIs for RSS remixing into Windows Vista], the next version of Windows. As a result, RSS remixing programs will become even more numerous on the Mac and Windows and of course, also Linux when RSS support is built into it.
RSS is well on its way to becoming ubiquitous and will become a lowest common denominator of data transport and interchange on the Internet. This means RSS will be everywhere and if something on the net is not available in RSS, it will be possible via web services for remixers to convert it to RSS.
Therefore, in addition to the straightforward integration of RSS in and RSS out of programs like [Microsoft Office 12], there will be some more interesting uses of RSS. One is for new types of software like the Social Office (a new term for a suite of programs and services for the new social services which are becoming the 'office of the web' like del.icio.us bookmarking, blogging, photo blogging with flickr, podcasting and videoblogging) and the other is for portals: personal portals and portals for business and organizations.
One of the harbingers of the Social Office is the [flock] web browser, which its developers have dubbed the "Social Web Browser" since it integrates in an easy to use fashion the new social features of the web such as photo blogging from flickr, blogging using standard APIs to your blog(s), social bookmarks like delicious and more. What if flock incorporated RSS remixing into it so that remixers could do whatever they want with the RSS feeds they generate and the ones that they are interested in and display the result in an attractive and usable manner?
In other words, what if remixers could, based on their remixed RSS feeds of interest, create their own customized portal? This is the premise behind [Marc Canter's vision of a digital lifestlyle aggregator] which is for individuals remixing their feeds of interest.
The author believes that this is also the vision behind the [Tucows Start Service] which allows business to generate custom portals for business based on their RSS feeds of interest.
Furthermore, as RSS continues to become more and more used for non human events like industrial processes (e.g. number of widgets made in the last hour) and product development processes (e.g. checkins to a software development source code control repository), and it thereby becomes the the universal container for enterprise information flows, then [RSS will be the basis for a real-time enterprise console]
When RSS generation, aggregation and bricolage are built into everything, it will become invisible. If the infrastructure is in place everywhere to remix RSS, then tools and applications can be built that will enable users to concentrate on retrieving and sharing what they want without having to know the technology underneath. This is the ultimate future of RSS remixing. In the long term future, like the SMTP simple mail transfer protocol, RSS will disappear and people will use it without knowing it is there.
Roland Tanglao graduated from the University of Waterloo with a degree in Systems Design Engineering. Working at Nortel Networks, he ran its first internal corporate blog focusing on developer relations. Roland has been blogging since 1999, and was the first business blogging consultant in Canada. Roland is one of the founders of Bryght and as Bryght's Chief Blogging Officer, he reads hundreds of blogs daily through his RSS reader and participates in many online communities. He is an expert community manager, with UrbanVancouver.com and his personal restaurant review site, VanEats.com, being the two best examples.