Why Microformats? An introduction to Microformats.

Introduction to Microformats

Microformats are small and gentle syntactic touch-ups for your web pages.They have one major purpose: to make your data readable by both man and machine. They are the technical diplomats of the Web; allowing the same piece of data to be shared among many applications and people.

What is more, they do this in an easy and pragmatic way.

A Brief History

The machine-readable-data (and thus the microformat) concept is not new; it has a very recent forebear: the Semantic Web. The Semantic Web has the same basic ideal – to share data with machines – and the benefit of several years of research and some of our brightest minds. Despite this, it is the albatross that simply cannot take off. It has stalled.

The Promises of the Semantic Web

The computer could harness knowledge of you, and the people you deal with, to help you efficiently side-step life’s mundane tasks. A simple example is the dentist appointment. You tell the computer you need to see the dentist, and it swishes off to look at your profile (when is your preference?), your calendar (are you free?) and the dentist’s calendar (are they free?) and automatically books your appointment.

Using a concept of a ‘person’, their work, and their relationships; the Semantic Web would be able to do a better job of inferring trust over the Web. Thus making it easier to find reliable sources.

Search would be revolutionised, as all data would be categorised and linked in a way the search engine could meaningfully understand. A search for “Show me an image of a red apple from Chile” would give you precisely that.

Why is it Failing?

  • It is complex to understand and implement.
    RDF, OWL, DAML+OIL, FOAF, Ontology, Topic Map, Semantic Spectrum, Description logic… the list goes on. The words are frightening enough, but work with the Semantic Web for a while and they rapidly become synonymous with overwhelming confusion. The complexity partly stems from it being machine-orientated: it is asking us to work for the machine, rather than the machine to work for us.
  • It is difficult to use.
    BigBlogZoo is a high quality consumer semantic web browser, but even its screenshots scream “I WILL OVERWHELM AND INTIMIDATE”. Other technologies that would support it are also lacking. “Show me an image of a red apple from Chile” would require a very good natural language interpreter, which simply does not exist yet. Instead, we are forced to make our query something the computer can understand – think Google’s Advanced Search, only significantly more tedious.
  • It requires duality.
    A ‘human’ perspective and a ‘machine’ perspective need to be maintained for the same information. The majority of us struggle to maintain even one version effectively. Think of your calendar or diary – for short term tasks that you can remember, you are often loathed to transcribe them to a diary.
  • It is ‘all or nothing’ in its approach.
    As data must be rewritten, we have to wait for a sufficient volume of information producers and consumers to be using the Semantic Web before it is useful. As if this is not problematic enough, the standards are still being revised. Just like with VHS vs. Betamax in the 80s, it is hard to fully commit to when the foundations are not solid.
  • The landscape has changed.
    Google does a good job of data recovery and tagging has risen to assist with both classifying data and retrieving it. While they are not perfect, they are ‘good enough’ to keep the masses happy. Given the choice between ‘good enough’ and extra complexity, people will choose good enough.

In summary, the Semantic Web is too complex in implementation, and too optimistic in its ideals be useful to the masses right now.

While the bulk of the Semantic Web is languishing on the launch pad, one small part did successfully lift off: RSS.

Why did RSS succeed?

  • Simple to use.
    A tiny sprinkling of descriptive information turns your News into RSS.
  • Narrow focus that is easily understood.
    It has the ‘getable’ factor – we all understand the concept of headlines. It relates to the real world, it is human-orientated.
  • It is standalone. You do not need to use complex systems to publish it, or provide them to benefit from it. By simply dropping it into your page you are saying “here, benefit from this if you need it”.
  • The machine-readable output directly benefits humans.
    Software can gather and recombine multiple RSS feeds to bring you only the stuff worthy of your attention.
  • The data-duality problem is sidestepped.
    Most RSS output, such as a blog, is automatically generated from the core data by software. It requires no effort to maintain.

Microformats continue the trend of RSS. They do not even pretend to try and solve the complex problems like trust, and common understanding. Microformats greatest advantages are to ooze simplicity and usefulness.

Microformats

A microformat is a simple addition to your regular HTML, using the class, rel, and rev attributes.

They come in various flavours, depending on what the data is about. A handful of the most well established microformats are:

  • hCard
    Business card style information about a person.
  • hCalendar
    Description of time-based events.
  • hReview Format for reviews.
  • XFN
    XHTML Friend Network: representing social relationships using regular hyperlinks.

Technical example of an hCard:

<div class="vcard hidden">

    <a class="url fn" href="http://bumblesearch.com/bsearch/about/andy_mitchell">

        Andy Mitchell

    </a>(<span class="nickname">Andy</span>)

    <div class="title">Consultant & Developer</div>

    <a href="http://www.gtdgmail.com" class="org">GTDGmail</a>

    <div class="email">andy@gtdgmail.com</div>

    <div class="adr">

        <span class="locality">Leeds</span>

        <span class="region">Yorkshire</span>

        <span class="country-name">UK</span>

    </div></div>

Here in lies the magic. This is a regular, displayable piece of HTML, and yet it is now ready to be interpreted by machine. This is all thanks to previously agreed upon syntax.

The community is active and welcoming, and many more microformats are being proposed and used.

Why are Microformats Primed for the Big Time?

Microformats have developed in a very organic way, with much grassroots support. Whether intentional or not, they have also learnt from the Semantic Web’s mistakes. It is a guerrilla approach, sneaking stealthily onto the internet, gently working with us rather than against us. For the metaphorical, a Microformat is a loyal sheep dog helping you round up your existing data, whereas the Semantic Web is a 800lb gorilla smashing up your info and grunting ‘my way’.

Genuinely Useful Qualities

  • Simple to create.
    Microformats are easy to learn and super easy to implement. In many cases, a quick UI tweak of your blog/CMS software will provide microformats across your entire site.
  • Single copy of information.
    Your existing data has microformat info added to it. You do not need to create or maintain one ‘human’ copy, and one ‘machine’ copy.
  • Standalone Concept.
    You can then extend your use of microformats organically, in line with your needs, ideas and growth. There is no need to do everything straight away.
  • Immediately useful.
    Microformats are useful today. Firefox extensions and Technorati search engines are just some of the ways they can be consumed. You do not need to wait for critical mass for them to become worthwhile.
  • Open standards.
    The actual standard is created and improved by the community that uses them. There is no ulterior business motive to their development. Microformats will not disappear if a company goes bankrupt.
  • Machine Readable - Microformat as an API.
    By dropping in some microformat data, your work becomes machine readable. It is, in affect, providing an ‘API’ to your data, allowing other software to consume, aggregate and display your data in new and interesting ways.

Logo Cloud

Mainstream Acceptance

It is not just the people on the ground who are adopting microformats.

  • Bill Gates and Tim O’Reilly recently jousted over them at Mix06. Tim concurred that microformats are driving the semantic web, and Bill called for more usage and further agreement.
  • Technorati has built a microformats search engine that allows you to search over contacts, events and reviews. Tantek, Chief Technologist at Technorati, is one of the prominent promoters
  • LinkedIn has converted its profiles, and is pushing for a hResume format.
  • Flickr has adopted the geo microformat for identifying the locations of photos.
  • Yahoo Local heavily adopted microformats in all its listings, instantly creating a super resource of microformatted information (rather sneakily too: a developer quietly added it in; but fortunately the executives soon saw the advantages).

With strong global support, microformats have momentum. This support will lead to more data, which will lead to more tools, which in turn will lead to more data – we are just seeing the start of the avalanche.

A Bright Future

A beautiful property of microformats is that they are non-competitive. Once data is readable with a machine, then it becomes trivial to convert it to whatever standard rises to prominence in the future.

This linear path from our current position (commonly called Web 2.0) to a fully semantic Web (commonly called Web 3.0) would make a microformats the basis of a Web 2.5. Thus, adding microformats to your data will never be considered a waste of time. It will not become a dead-end. It is not a Betamax.

Interestingly, microformat data is theoretically consumable by current Semantic Web tools (which have the benefit of many years of heavy research and development).

Data Conversion Between Microformats and the Semantic Web

Without becoming too technical, the building block of the Semantic Web is RDF. RDF makes statements about a resource (subject-predicate-object); which is essentially what a microformat is (e.g. this data is my age, this link is about me). Given both are readable by machine, and both have the same fundamental model, microformat-data can be easily absorbed into Semantic Web formatted data.

One example is that a university could let its departmental people outwardly publish their bio data in a microformat; but inwardly pull that data into its semantic knowledge base for its internal tools.

Near-Future Uses of Microformats

These uses are not in place yet, but they soon will be.

Make the Web More Spontaneous

Microformats will simplify the sharing of information between the Web and other applications.

Take these two examples:

  • You see the bio of a person on a webpage you wish to remember, so you drag their name out of the browser, and drop it into your Address Book (e.g. Outlook). As the data in the hCard is machine readable, the Address Book can easily interpret it.
  • You see an event on a webpage you wish to attend, so you drag it out of the webpage and into your Calendar, or a web service like Upcoming.org. The system could even tell you, at a glance, whether that particular event already clashes with something else you have planned.

Vanity

Ensuring both you and your work are more easily found in search engines.

Exposure in Search Engines
Information with embedded microformats will help existing search engines better assess your web pages, thus promote them more accurately. New search engines will also appear specifically built for microformats, therefore providing you with more means for discovery.

Technorati already specifically indexes and searches over microformat data such as people, events and reviews. The more things microformats can describe, the more a search engine will rely upon them. Google and Yahoo will almost certainly have to make use of them to remain competitive.

New Search Engine Features
New variations on searching may appear within engines like Google. For example, when searching a place (Herb’s Happy Pizza), the search results could reveal reviews of the venue and upcoming events there, using hReview and hCalendar.

Screenshot showing Review and Calendar information within Google's search results.

Suppose someone stumbled across an article, “Case Study of Mongolian Hamsters”. The browser could automatically reveal the author and their other web pages.

  • The visitor is presented with more useful information.
  • The author no longer has to explicitly maintain links to all their work.

Similarly, this ‘connected’ information could be revealed in Google’s search results:

Screenshot showing Author information within Google's search results.

Explore Personal Networks

Current social networks – MySpace, Facebook, LinkedIn – are the Web’s hot properties. However, they have two major drawbacks:

  • You and your friends have to agree which to join.
  • You cannot easily reuse that data elsewhere (it is trapped in a data silo).

Great as they can be, they are just a simple part of the transition until we have a Web where the relationships between people are implicit yet precise. There will be no data silos, just the Web itself. New websites and tools (perhaps built on browsers) will use this mass Web data to provide social networking functionality.

Microformats are the glue that will express the relationships between people and ‘things’ (web pages containing people, places, work, documents, images etc.) by embedding this data directly into hyperlinks, enabling this ubiquitous social network to happen.

A current early realisation of this is the backnetwork built for the dConstruct conference. The backnetwork will express relationships between people using simple hyperlinks (XFN), and aggregate Flickr and blog posts related to a person into their profile. This is all done automatically by the machine. Future systems will be able to link together personal data on the Web much more effectively, as it will be marked up with microformats.

So, who will build all this?

No one and everyone! It will be built and popularised by the ground troops. By us. Microformats and their success will come from pockets of global innovation – like sulphur bubbles rising out of a lake to form a complete cloud – someone will extend a format, another will create a tool, others will adopt them in their web pages and the virtuous circle will continue to give.

In Conclusion

We have known for many years where the Web is going, and what we want from it. We want it to truly fulfil the dream of augmenting and improving our lives with minimal effort, by helping us organise, promote, and educate ourselves; as well as establishing new relationships.

The problem is we chose the wrong vehicle to take us there. In the Semantic Web we chose a scramjet, something with a glistening future but that is too complex and impractical for use outside of academia. Microformats, on the other hand, have come like a shiny new bus to scoop up the masses and get us moving.

It is up to us, the people with the websites, the people who control the data, to embrace the vision and ensure it becomes a useful reality. One piece of HTML at a time…