Search
Tags

Entries in guardian (1)

Tuesday
Oct192010

Using linked data to build context around news

In a post a few weeks ago on the excellent R4isStatic, the BBC's Paul Rissen outlined an inspiring vision of how news could be enhanced by embracing the possibilities of the web as a medium. Hold on, I hear you cry, surely news has been there, done that and is now hurrying off behind a paywall again? Hasn't the web transformed the way organisations distribute news, people consume it and even, through social media, the way that news is gathered in the first place? Well, yes, yes and yes - but.

Paul makes the point that while the inputs and the outputs might have changed, the product itself remains remarkably similar. A story appears, without context, designed to simplify and dramatise a set of issues enough that readers, viewers or listeners will be quickly satisfied. Paul outlines how a data-driven, semantic approach to news might situate it in a wider and more meaningful context, drawing out the latent power of the web to deepen and enrich our understanding. (In so doing he has some trenchant observations to make about how it's the conservatism of journalists rather than new technology that's responsible for our supposedly shrinking attention spans.) I urge you to read the whole post and subscribe to the blog.

And I mention it now because yesterday the Guardian announced that its Open Platform Content API will allow users to query its data by MusicBrainz IDs and ISBN numbers. (I have to declare an interest; having overseen the BBC's adoption of MusicBrainz IDs as Interactive Editor for Music, I'm delighted to see another major content player take them up. Like any other standards, open data standards like MusicBrainz stand or fall by their spread and adoption.)

It's worth pointing out that this is essentially a developer-facing initiative, as a glance at the FAQs will make clear. So what's the connection with Paul Rissen's vision of a linked-data-driven news? One objection to that vision is that there is already plenty of context provided by the manual linking and proprietary tagging carried out by journalists now. And the Guardian's link-up with MusicBrainz and ISBN numbers doesn't immediately change the content or links on their own pages, which usually provide a great deal of context, again manually, in particular in the music area.

But the context that exists now rarely transcends the format of a point-to-point link or index page, and is very largely contained within originating organisations, limiting its scope really to change or broaden readers' understanding of concepts involved.

What the Guardian has done will start to enable third parties to build automatically on the context provided by their stories and the metadata around them; and (in theory at least) vice versa, pointing the way towards a much more integrated web of information around topics that might fundamentally change the experience and expectations of news consumers. And (again in theory at least), context could move beyond point-to-point links and begin to model visualisations of the information around concepts like, say, artists or books. (We did this very thing at the BBC - albeit in a very rudimentary way - around the concept of the play count for artists).

Of course this all raises a whole set of questions about business models. Paywalls represent a fundamental challenge to the viability of building open data around news (and vice versa). And thoughts about revenue models around linked open data are in their infancy. So it's a good job our own Simon is in a position where he can move some of these conversations on.