Main content

I'm Basile, hacker-journalist with BBC News Labs over in Euston, and I'm just recovering from the third edition of NewsHACK, which took place on December 15 and 16 in London.

50 participants came from all over the world to hack with us around language technologies. The Qatar Computing Research Institute (QCRI) sent a team over, as well as Germany's international broadcaster Deutsche Welle, Bulgarian semantic tech company Ontotext – some other participants came from Latvia and Portugal.

Matt Shearer, head of News Labs, kicking off the hacking

The event took place in a fantastic startup incubator literally 100 yards away from Tower Bridge,and was run by the fantastic Connected Studio team, driven by our friends from the World Service, and supported by the News Labs.

Our call for collaboration during the event's kick-off proved very fruitful, as the 13 teams reshuffled into 11 after some unplanned collaborations. Staff from BBC Monitoring, BBC Weather, Travel, News, Location Services, R&D IRFS, and World Service joined other teams to work on some great projects.

The Winners

Among all these projects, the judges had to pick winners:

1- Best in Show: Qatar Computing Research Institute

Translating BBC Arabic video into English, including subtitles, voice and using Speaker Diarisation to change gender of speech synthesis alongside the changes in the voiceover gender.

2- Best practical innovation & Closest to launch: BBC Voice

Tom Collins, Owain Lewis & Darren Lucas from The BBC Weather, Travel News and Location teams demonstrated a great voice-controlled BBC app.

Illustration 1: Matt Shearer, head of News Labs, kicking off the hacking

3- Best speech synthesis hack – Cereproc and Red Bee Media

The team demonstrated the process to go from subtitling to speech synthesis, with really good, realistic voices. Also demo'ed in 4 languages during the presentation!

4- Best entity extraction – Ontotext (proud providers of our LDP software)

They hacked a great solution for Russian & Arabic entity extraction, using DBPedia, Freebase, and then some clever statistical tricks to get around scarcity of data in those languages in the reference sets.

5- Best end-to-end multilanguage tools - “Global Vox” by Edinburgh university

They demonstrated some brilliant work in all areas of the chain - starting from Audio, through segmentation, transcription, summarisation, NER, translation, to speech synthesis.

6- Most perplexing and steepest learning curve for judges (unplanned award) – Andreas from UCL

For Multilingual Entity Relation Extraction and Knowledge Base (using cutting edge machine learning and NLP, across languages). He quickly outlined some cutting edge machine learning approaches which were "quite simple" and "still in under review".

Great Outcomes

In between the stressful two days of serious hacking, we managed to kick off at least three informal Language Tech collaborations.

We'd like to see more of UCL and Latvia-Prebaram's project of multilingual Knowledge base for global journalism research tools. Also, Cereproc and their Speech Synthesis will hopefully allow audio delivery of articles, subtitles, captions, and audio description for new language services.

Finally, Edinburgh university really impressed us. There's almost too much to write, as they demonstrated their handling of the whole chain.

We also hope to work with QCRI on some "Arabic language audio processing" topics, perhaps in partnership with BBC Monitoring.

Team Ontotext

It was a fantastic success and we have kicked off some key relationships for our future Language Technology work - a key part of reaching another 250 million globally.

Serious collaboration and hacking at newsHack III

Special thanks to Connected Studio for running another dazzlingly good #newsHACK event, to World Service for driving this new international partnership with us, and thanks to the News Labs team for supporting and networking with the future partners.

Basile Simon is Hacker Journalist, BBC News Labs

More Posts

Next

A closer look at three CBBC prototypes