Research & Development

Posted by Rob Cooper, Andy Secker on , last updated

For the last three years, the BBC has worked with University of Oxford and our Data Science Research Partners Surrey University and University College London on an important research project into automated sign language recognition. The first results of this work, the BBC-Oxford British Sign Language Dataset (BOBSL), has just been released and is available for download by the academic community under a non-commercial licence agreement. BOBSL is one of the largest and most comprehensive British Sign Language datasets ever produced.

In 2017 we were approached by a couple of our Data Science Research Partnership (DSRP) partners, Surrey University and UCL along with the Visual Geometry Group at the University of Oxford to support a bid and be on the Advisory Board for an academic research project called ExTOL. . As part of this support, the creation of a dataset was discussed using the BBC’s  large archive of British Sign Language (BSL) recordings.  The BBC produces signed versions of our most popular programmes for the deaf and hard of hearing community, covering a wide range of genres and programme types.

They planned to analyse BSL footage using a range of linguistic and technical approaches, with the ultimate goal of building a system to interpret human signing automatically. Their proposed system would do this by watching footage of someone using BSL and analysing the pose, hand and head movements, facial expressions and mouthings. By comparing these complex movements with the labelled data we helped supply them with, the system could make an informed guess at what was being signed.

The University of Oxford will be sharing the results through a keynote presentation at an upcoming conference, but the dataset they used to train it on is being released to the wider academic community now. This will allow other researchers to train their own systems and to compare approaches. The dataset is made up of 1,962 BSL signed programmes from the BBC archive, which comprise around 1,400 hours of footage in total. Each programme has an associated transcript, which has been approximately time-aligned with the BSL signer. The signer has been carefully cropped out of each programme, and background faces blurred to aid the training of machine learning systems. A diverse range of programmes, including drama, documentaries and comedies are included, and likewise, 39 separate signers are included in the data.

High quality large scale datasets are critical for modern research in deep learning. The lack of such datasets for sign language recognition has held back research in this area. BOBSL will enable new approaches to be trained and evaluated at a scale that was previously impossible, and will lead to breakthroughs in automated recognition and linguistic understanding of this important means of communicating.

Andrew Zisserman, Professor of Computer Vision Engineering, University of Oxford

We’re extremely proud to have worked with the University of Oxford and our DSRP Partners in this work. One of the most important things any research and development department should do is to foster innovation. An increasingly important way for an organisation like the BBC to do this is by releasing data for machine learning research, which can lead to all sorts of innovation. 

For example, automated sign language translation could allow virtual assistants like Siri or Alexa to be adapted to work for the deaf and hard of hearing community, or to allow educational training firms to create interactive tutors to give instant feedback to someone learning BSL or to provide written transcripts to videos of BSL and thereby make them fully searchable at scale for the first time. Although our data is currently limited to non-commercial use, we hope that academic researchers will prove the viability of such systems and thereby inspire commercial companies to invest in this field.

The Data Science Research Partnership is committed to releasing further datasets in this way. We are working hard with our legal and technical teams to make more data available to our partners and the wider academic community. We’re hoping 2022 will be a bumper year for further data releases.

If you’re an academic researcher, you can apply to get access to the data.

We’d like to thank Red Bee Media, who provide BSL translation services for the BBC, for supporting this work.

Tweet This - Share on Facebook

BBC R&D - Partnerships

University of Surrey

University College London - Big Data Institute

University of Oxford - Visual Geometry Group

University of Bristol

University of Manchester - Data Science Institute

University of Edinburgh - Data Science initiative

Imperial College London - Data Science Institute

Queen Mary University of London

Ulster University - Data Science

Engadget - The BBC is turning to AI to improve its programming

IT Pro - BBC to develop new services based on machine learning

The Verge - BBC will use machine learning to cater to what audiences want to watch

Advanced Television - BBC seeks to unlock data potential

TVB Europe - BBC plans five-year project to 'unlock potential of data'

Topics