Main content

Opening up the BBC's linked data with /things

Robin Murphy

Software Engineer

Today we released a brand new open data website for the BBC.

BBC Things allows anyone to access the data that we store about concepts that matter to our audiences. The website includes data on the places, people and organisations that appear in BBC programmes and online content. This data already powers large parts of the BBC website, including BBC News and Sport, and is now available for anyone to access in standard open data formats.

BBC things home page

BBC Things is part of our Linked Data Platform, which until now has only been accessible from within the BBC. The new website provides public access to data stored in our platform and, importantly, provides a public reference for all of the things that the BBC creates content about. Whether that's a sports team like Blackburn Rovers, a place like Cambridge or even a BBC News service like BBC Mid Wales, there's now a publicly accessible web page for it on the BBC Things website.

Who is it for?

The BBC Things website is designed to be used by anyone who works with our data at a technical or editorial level. From an editorial perspective, the website makes it easy for content editors, producers and creators to discover concepts that exist in our platform. By searching for events, places, people and other concepts you can quickly find relevant references to associate with your content.

From a technical perspective, the machine readable formats that the website supports allow developers to create new applications using the BBC's data. You might want to use our data as a reference for concepts that you create content about, use properties maintained by BBC editors about journalistic storylines and events or even combine our data with other open sources to build entirely new datasets.

Why did we build it?

Our Linked Data Platform currently enables teams within the BBC to build applications and websites centered on the things that matter to our audiences. Whether that's a web page for a Premier League football team or a stream of content for a local council, Linked Data is already providing the platform for new online content within the BBC. With the BBC Things website, developers outside of the BBC can now use our data to create new websites and applications of their own.

In addition to allowing the public to use our data, the BBC Things website enables us to achieve one of the key principles of Linked Open Data – that every concept have its own publicily accessible URL. When we started building our Linked Data Platform, we used a URL to identify each concept in our store. These URLs all start with http://www.bbc.co.uk/things/ and contain a unique ID, or GUID (http://www.bbc.co.uk/things/189be289-16dc-4394-b7dc-e8f022e78d14#id, for example).

As Tim Berners Lee, the inventor of the World Wide Web explains, using URLs as identifiers is a key part of the Semantic Web. Up until now, the URLs that we use as identifiers didn't actually exist on the web. Now that the BBC Things website exists, the tens of thousands of concepts that we store can be accessed directly using their identifiers.

How did we build it?

We built the BBC Things website on top of our existing Linked Data Platform. This means that it's built using the same underlying services as the BBC News and Sport websites and apps. The BBC Things website isn't just a snapshot or archive of our data - it's a window into the live data that powers the BBC online. Journalists and editors continually update the data we store, and so concepts can change regularly and new data will appear every day. This is particularly true of News Storylines, which can evolve quickly in a short period of time.

BBC Things page for a News storyline

Every page is available in HTML format (which is what you'll see when you visit the website using a web browser) but these pages are also embedded with additional semantic data that can be extracted by search engines and other applications. This technology is called RDFa, and is one of the ways that developers can make use of the data on the site. In addition to the HTML format, all of the pages are also available in Turtle and JSON-LD format, as described on the site.

The website itself is built using Node.js and is deployed to the BBC's Cloud Platform. This means that we can use continuous delivery to make improvements to the live site many times a day. The application uses the same API and data store that are in use by other parts of the live BBC website.

When a request is made for a BBC Things page in HTML format, the application requests the corresponding resource in JSON-LD format from the Linked Data Core API - a service that provides basic information about concepts in our store. This API call is made through an API gateway (provided by Apigee), which provides additional security and monitoring for the underlying API endpoints. The Core API in turn queries our triplestore (provided by OWLIM) using the SPARQL query language and converts the result of the query into a JSON-LD response. The BBC Things application then parses the JSON-LD response into a plain Javascript object and renders it as HTML using the Mustache templating language. When handling JSON-LD and Turtle requests directly, the BBC Things application acts as a simple proxy for the underlying Core API.

What data is exposed?

The BBC Things website gives you access to a set of standard properties that we maintain about the concepts in our store. These include each concept's preferred label (our preferred name for it), disambiguation hint (a way to disambiguate it from other similar concepts - useful when tagging content) and its type (whether it's a person, place, storyline etc.) We also expose links between concepts and web pages about those concepts through the primary topic of property. The Things page for Manchester United, for example, has a link to the club's official website.

In addition to these basic properties, we also expose more domain specific fields for different types of concepts. For places, like London, we provide their latitude and longitude using data that originates from the public GeoNames database. All of the types of data that we maintain about our concepts form Ontological models. These models can already be accessed publicly using our Ontologies website, which we released earlier this year.

Whilst many of the properties that we maintain about concepts are accessible through the new website, there are some fields that are not displayed. These include properties that are used for authorship information, internal editorial content and other production metadata. We work closely with editors from around the BBC to decide which data can be made public and our aim is to make as much of the data as we can open by default.

How BBC Things looks on mobile

Where next?

It may only be a simple website, but we feel that it embodies many of BBC Future Media's guiding principles. The website isn't just concerned with one small part of the BBC; instead it opens up data from the whole of the BBC online, from Music to Sport and News to Education. We're also making the most of the BBC's new Cloud Platform to deliver continuous improvements to the live site.

The BBC Things website represents the first step in opening up our Linked Data Platform and making our data accessible to the public. In the coming months we will continue to increase the amount of data that is available on the site and expand the number of ways to discover concepts in our store. We're excited to see what applications can be built using the BBC Things website. If you'd like to know more about the project, or using the data that we've made available, you can email linkeddata@bbc.co.uk

Robin Murphy is a Software Developer, BBC Future Media

More Posts

Previous

Rebuilding BBC Search