Search
Tags

Entries in linked open data (24)

Monday
Jan302012

More on Simon's upcoming work with CIKTN

I wrote a couple of months back about year two of my stint as the Creative Industries Knowledge Transfer Network's Metadata Champion. Here's a little interview with me the CIKTN filmed to provide a more detail, or at least to put a face to all the blog posts.

Meet The Team - Simon Hopkins from Creative Industries KTN on Vimeo.

Friday
Oct282011

Green shoots of linked (open) data around news

Based on Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch - some rights reserved (CC-BY-SA)

A couple of weeks ago Justin and I attended an event hosted by Talis, Data on the Web: The Benefits of Linking. It was a useful session all round (and if you are anywhere near Birmingham on 10 November, I note that another one is planned for there and then). We were particularly taken by this presentation by Jared McGinnis on the Press Association's thinking behind its recent publication of its ontology, and the part the ontology is set to play in the PA's strategy. This points to the commercial value of linked data - not only linked, but linked open data, and open in the read and write directions: PA has openly published its ontology, and it is ingesting geonames to provide IDs around location. The commercial angle is particularly significant, as the argument for linked open data is often couched in the language of public good, and often advocated by public institutions like government or the BBC. PA's advocacy - and we hope proven commercial success will follow - has the prospect of being something of a game-changer in perceptions of the value of semantic approaches. (Incidentally Talis' Tim Hodson provided a nuanced and useful explanation on their blog last month of the difference between publishing data, which often has a proprietary commercial value for its owners (or indeed is not always fully theirs to give away) and publishing the data model.)

But speaking of the BBC and the public service argument for linked open data, it is heartening indeed to see that the BBC is recruiting a data architect for News. I've linked before to Paul Rissen's article of a year ago eloquently setting out the case for the public value of this stuff, and it now looks as though he and like-minded colleagues are really being listened to inside the organisation.

I'm tuned into what's going on in news at the moment on account of my work programme managing the relaunch of ITV's News, Sport & Weather service, but there is clearly wider significance to all of this. I still hold out hopes for linked open data becoming a standard in the music industry; unfortunately the adoption of MusicBrainz that I oversaw as editor of the BBC Music website has yet to travel much further (beyond patchy adoption by last.fm), despite the benefits that I believe could accrue to record labels in particular from adopting such an open standard. As Simon pointed out recently, the ideal of the semantic web still feels like an idea whose time is coming, but we do need to start seeing proven commercial models if it is to make it into the mainstream.

Friday
Sep092011

Meditation on memory and data

Let me start with an embarrassing admission. In all the months I have been turning over in my mind the relationship between memory and structured data, it occurred to me only for the first time the other day that the word "memory" has a very specific meaning in the field of computing, and one so closely connected to data as to be almost synonymous. More precisely, memory is a measure of the amount of room for data, and data is everything that populates memory.

So we have - in barely the space of a generation - grown so used to this elision that it has become invisible (at least to me). But on closer examination it is really peculiar. Memory - of the traditional human variety - is slippery and organic, as any legal expert will testify. It inhabits a dimension of vast, multifaceted complexity, where the storage of facts is tightly wound together with the senses and the emotions, so that the first wintry day or the famous bite of a madeleine can unlock access to recall of events and feelings otherwise completely out of reach to the rational mind. Computer memory, on the other hand, is binary (like all else computerish). The data is either there or it isn't. At its most coy, it might hide in a disk partition that is invisible to a standard search routine, but it isn't going to wait for the appearance of a lost chord to show its face. And when corrupted, it gets corrupted in straight lines and blocks, like this picture.

When human memory gets corrupted (as it does all the time), it does so in the way the mind dreams - in a way that is probably irreducible to verbal explanation or digital representation.

So what? Words evolve, and it's natural that technologies that transform our understanding of culture and human experience will call forth new vocabulary and forge new meanings out of old words. Memory is just one of a long list (icon, avatar, bug, application, folder, browse, to cite a few random examples) of words that have been transformed, revived, debased or ennobled in the crucible of new technology. But the significance of this change is what it tells us about the human experience of memory, our habits and expectations of it, and how they have been changed by computers. What are the implications?

1. "The data is either there or it isn't." We have grown used to trusting the fidelity of computer memory over our own, and so we entrust our information to it with great confidence. To some extent we have always done this with documentary technology, ever since we moved from the oral to the written. But we don't expect computer memory to yellow, rot or burn. We know that computers break and get lost or stolen - but we also know that we can hedge against this by backing up.

2. Because we trust the relative robustness and fidelity of computer memory, we have become more blasé about the challenges of filtering and retrieval. We breathe a subconscious sigh of relief and hand over the heavy lifting of our own personal knowledge economies to the machines. We're in danger of losing sight of the fact that memory is not the same thing as remembering. It is still up to us to know where and how to look (crafting the right search term, navigating our folder structures). Even if the data is there, we still need to know it's there in order for it to be useful.

3. What is true for private memory is doubly so for the public kind - the inherited knowledge of our societies and cultures. The Internet is our new global memory, a collective consciousness that throws up wonders like Wikipedia or the Khan Academy, each testifying the inspiring power of visionaries and networks, but also the commodification of mere knowledge.

I've blogged before about the treacherousness of these first two points (and Justin has picked up the related problem of the many ways not to know). But let's have a drill down to some of the problems around that last point. True, we are building up an astonishingly comprehensive and reliable databank that is increasingly accessible any time and anywhere ("martini knowledge" if you like, to adapt Ashley Highfield's unfortunate coinage). And we are also finding all kinds of useful ways through it, via algorithmic or social search. Our digital remembering increasingly blurs the private with the public, as we outsource the storage and filtering of our memories to friends and acquaintances on social networks. Not only the storage but also the processing of knowledge and memory are getting digitised apace.

But there's the rub: what seems to be, and is sometimes said to be, our global brain has nowhere near the sophistication and power of our apparently puny individual brains to connect and analyse. We are tempted to trust to the cloud to do our knowing for us so we can get on with the more glamorous business of commentary, discussion and mashup. But if our brains don't have the facts they can't process them. Having access to a computer that knows what happened in Afghanistan over a century ago does not give you the vital historical perspective that simply knowing those stories yourself gives you. And that matters because it is still human beings, with their onboard knowing and processing, that make the decisions.

I haven't yet figured out what all this tells me - all comments gratefully received. But here are a couple of tentative, and hopefully complementary, conclusions:

1. We need to be a bit careful in how we re-evaluate what it means to learn. Knowing facts feels like a lower order of intellectual activity to processing them. Having pretty much ubiquitous access to calculators (on our computers and phones) means that you will never again need to do long division. It's tempting to leap by analogy to the idea that having access to the "global brain" means that you will never again need to commit a fact to your own memory. Why use valuable time and mental energy for this that could be spent on analysis? But learning how to think about things is still nowhere near a substitute for knowing them, as individuals. More than that - you need to know things in order to form the context to think about other things.

2. That said, we needn't retreat from the enterprise of making the machines help us think. This is where semantic technologies such as dbpedia should help supplement the current emphasis on folksonomy, term extraction and facial recognition. For example, tagging stories of events present and past in Afghanistan with "adventure" and "failure", situating those concepts in intellectual and emotional contexts that connect them with similar and opposite concepts and making such connections readily available as links wherever stories appear, should act as a valuable supplement to the vital but hazardous and selective business of human remembering, retrieval and reflection. (For starters, I still like a lot what Paul Rissen had to say a year ago about the potential for application of linked data to the production of news, and I'm waiting to see who's going to pick this up and run with it.)

So I suppose I am arguing for a more examined relationship between memory and data. The precise conclusions are less important than the act of paying attention to our assumptions as they evolve, and keeping an eye out for unintended consequences. 

Monday
Jul042011

Looking back on a year as CI KTN Metadata bloke

This piece appears in the Creative Industries Knowledge Transfer Network's document The Five Key Themes, to be published shortly. The document is a round-up of the work delivered by the KTN's "theme champions" over the course of the last year. Regular readers of this blog will known that I've been the TC for Metadata. What follows is my overview of the year - and the year or so ahead.

It's something of a given that people in the content industries (music, publishing, broadcast, film, games) don’t much like even the term content, let alone the word data. And metadata? What’s that got to do with creativity? Yet even the briefest of surveys of the UK's technology scene reveals a vast amount of highly creative thinking going on around metadata, and cutting edge, ground-breaking and potentially world-beating thinking at that.

Of course, we should have known this all along as the UK's track record is a strong one in this arena. In my own area of music, I look even now at Last.fm; the music fan in me sees a fantastically useful music recommendation and streaming service; the technology professional sees a quite brilliant feat of metadata wrangling.

It may be a truism that services offered by the web can be overwhelming, but it stands; who among us hasn't felt at least occasionally daunted by the web's riches? Now think about the power of music recommendations made by Last or the gig recommendations made by London start-up Songkick and imagine them applied to other areas of our lives, helping to filter the daily deluge. Increasingly intelligent feed readers and news aggregators are just the beginning. The big game is surely in the personalisation of all online and mobile content and services. And that's all about metadata.

But that's just on the consumer end of things. In the content production and distribution sphere, metadata is very much the new frontier; the kinds of asset identification and classification, production mark-up, usage tracking and archiving and auditing facilitated through the smart use of metadata are at once streamlining the production process and releasing more value from content re-use and re-purposing and a vastly better understanding of content's use and place "out in the world". And of course, the intelligent and detailed analysis of content's consumption better enables IP owners to get their stuff to the right audiences at the right time and at the right cost.

And the other creative industries? What value is there in metadata for fashion, say, or for architecture or for the performing arts? Well, for one thing, as the 'Internet of Things' becomes a reality, things themselves get ever closer to data, and information about those thing closer to metadata. So here's our upcoming challenge: how to take to lessons learned from the world of data in the virtual, digital world and apply them to the new tangible, very real world around us?

Thursday
Apr142011

On UK Radio Player - and the importance of solid metadata

The following post was originally posted on 14.4.11 on the Creative Industries Knowledge Transfer Network metadata forum. Thanks to the CIKTN team for allowing its reposting.

A couple of weeks ago saw the launch of UK Radio Player, a unified platform for the online distribution of vast amounts of UK radio - most importantly, bringing together content from the BBC and commercial radio. It's a huge achievement logistically, technically... and not least politically. I won't try to describe it - go see for yourself. 

Launch your favourite BBC radio show from bbc.co.uk then use the search box. See how those results are drawn from across the entire UK radio industry? Now that's innovation. Seriously, I spent a half decade of my working life in the BBC's Audio & Radio Interactive department - and have spent another half decade or more working with both BBC radio teams and commercial radio operators - and I can't tell you what a huge step forward this is.

Anyway, the service has been discussed elsewhere at length and in depth, as a model of collaborative business practice, as a blueprint for how the BBC can work with commercial operators, as a shining example of UI integration across multiple brands... and on and on.

But there's something else I'd like to say briefly here - and, well, given the context it's pretty obvious what that is. But to be clear: UK Radio Player is a metadata success story. Or at least is on the way to being one. You've heard this from me before. Here's me at a Partnering for Innovation presentation in London back in September:

"We all know the headlines: the content industries have been transformed by the advent of online technology. The TV, radio, games, music, book publishing, magazine and newspaper and advertising industries have variously changed in the last decade; at their best some companies and sectors been enhanced and revitalised - but yes, some have been left facing oblivion. Meanwhile, entirely new markets have been born - think about the App economy, lately reckoned to be worth $2billion... But here's something which doesn't grab the headlines: many of the success stories buried in there are essentially metadata stories." 

Well, once again, the metadata angle on the Radio Player story is hardly going to grab any headlines, but the fact remains that this project stands or falls by the quality of the metadata underlying it: data about artists, tracks, shows, presenters, music genres, news and discussion topics ... and that's just on the supply side. Think about all the possibilities around personalisation and recommendation that smart deconstruction of user metadata allow. Imagine Last.fm-style tracking of prefs integrated with the Radio Player.

Well, I'm getting ahead of myself. In truth, the player is a long way from delivering all of that just yet. Play around with it a little bit and you'll start to get a feeling for what datasets do underly it - and which ones don't. You'll also start to sense which radio networks have got their own metadata in order from the frequency with which they turn up in search results. (Interestingly, the BBC performed pretty poorly in the first few days after launch, which felt weird when searching from within the BBC-branded iteration of the player - but that seems all sorted a couple of weeks on from launch.)

It'll be fascinating to follow the development of the project, and to see how far it can go in delivering a world-beating internet radio proposition - and from my point of view that's going to depend very largely on the kinds of resources which are brought to bear on a shared metadata infrastructure. In the meantime, it's made a fine start.