Dave Johnson on open web technologies, social software and Java
This is the third in my series of Web Integration Patterns. Check out the intro at this URL http://rollerweblogger.org/roller/entry/web_integration_patterns
Enable easier integration and better search across integrated web applications and sites by using standard mechanisms (e.g. Microformats, RDFa) to embed property values in HTML pages.
In this pattern, you define different types of resources or “objects," and the properties may be associated with each, or you adopt an existing vocabulary. You define a way to embed type information and properties values into HTML, by embedding name and value pairs in a way invisible to a user or by marking up strings of visible text as property values, or you adopt an existing technique.
For example, a cooking web site could share recipe data using this pattern by presenting each recipe with embedded property values to make it easy for other web applications, mash-ups and search engines to parse out the different parts of the recipe, title, ingredients list, list of steps, etc. This is a real example, by the way, Google indexes recipe data in HTML pages marked up in either Microformat-style or RDFa.
If you present each web page as a resource with a type, or multiple types, and embedded property values then you don’t need to define and provide a special “REST API” to allow other web applications to access your data.
This pattern can enable more powerful search capabilities across integrated web application because search engine’s can be aware of and provide special indexing for those different types of resources and properties.
There are at least three different standard ways to embed data in HTML: Microformats, HTML Microdata and RDFa, each of which I’ll explain below.
According to Wikipedia “Microformats emerged as part of a grassroots movement to make recognizable data items (such as events, contact details or geographical locations) capable of automated processing by software, as well as directly readable by end-users.” A community grew around Microformats and this community has worked to define Microformats for a variety of use cases. The microformats.org web site lists nine microformats, including hCard for sharing contact information, hCalendar for sharing events and hReview for sharing movie and book reviews.
The basic idea of Microformats is to use existing HTML element attributes, specifically rel and class, to convey types of things and properties of those things. Here’s an example from Wikipedia: to represent information about a geographic location in HTML, the Geo Microformat uses the syntax below:
The birds roosted at <span class="geo"> <span class="latitude">52.48</span>, <span class="longitude">-1.89</span> </span>
The outer-most span’s class element indicates the type "geo" and the inner spans carry the "latitude" and "longitude" property values.
If you want to take advantage of Microformats in web application integration, then you pick one of the existing formats or you invent a new one, ideally by working with the Microformats community to do so.
Another approach for implementing Embedded Properties in HTML is HTML itself.
HTML Microdata is a W3C working draft, a specification that defines ways to embed property values in HTML. As Mark Pilgrim explains in Dive Into HTML5, HTML Microdata “annotates the DOM with scoped name/value pairs from custom vocabularies.” This is very similar to what Microformats do, but instead of “overloading” the class attribute, HTML Microdata adds some new HTML attributes to enable embedded property values.
HTML Microdata enables you to mark-up an HTML element as an item that contains properties. You do this via the itemscope attribute, you indicate the item’s type via the itemtype attribute and each property is indicated by the itemprop attribute. Here’s an example, also from Pilgrim’s book:
<TABLE itemscope itemtype="http://data-vocabulary.org/Person"> <TR><TD>Name<TD>Mark Pilgrim <TR><TD>Link<TD> <span itemprop="url"> <A href=# onclick=goExternalLink()>http://diveintomark.org/</A> </span> </TABLE>
The table element defines the scope of the item and the type of the item is a URL http://data-vocabulary.org/Person which indicates that the item represents a person. The enclosed span element carries the “url” property of the person.
It’s good to have a standard way to encode types, scope and property values in HTML, but you also need standard vocabularies of types and properties to make this approach successful. There are some HTML Microdata vocabularies defined at data-vocabulary.org and HTML Microdata can also be used with RDF, opening up a huge number of vocabularies.
RDFa is a W3C recommendation, a specification that defines ways to embed RDF data in HTML and XHTML. Like HTML Microdata, RDFa adds new attributes to HTML to enable this. Let’s look at an example from the Wikipedia page on RDFa, which uses two of these attributes about and property:
<div xmlns:dc="http://purl.org/dc/elements/1.1/" about="http://www.example.com/books/wikinomics"> <span property="dc:title">Wikinomics</span> <span property="dc:creator">Don Tapscott</span> <span property="dc:date">2006-10-01</span> </div>
The above example embeds some property values about the book Wikinomics, which is identified by URL http://www.example.com/books/wikinomics. The example provides a title, creator and date values about that resource, and it doesn’t say what type of resource exists at the URL, but it could.
The advantage of the RDFa approach is that it ties in with the growing RDF / Linked Data movement, and the huge number of vocabularies from almost every field of endeavor. RDFa has some momentum and has been adopted by Facebook in its OpenGraph Protocol and Best Buy for exposing catalog data in standard ways. However, HTML Microdata can also be used with RDF, so RDFa is not the only way to go if you favor Linked Data.
We've talked about three different ways to implement Embedded Properties in HTML, so which one do you choose? It's not to me whether one of these techniques will dominate. Google is hedging its bet, so to speak, by indexing all three types of data. If you don't already have a favorite, one way to choose is to look at what others in your community, company or "industry vertical" are doing. The network effect applies here, so if one technique is already favored amongst web application or sites in your space, then go with that.
Next up: Resource Preview