header

The BBC World Service Archive Prototype

11 Pages Posted: 10 Jul 2018 First Look: Accepted

See all articles by Yves Raimond

Yves Raimond

British Broadcasting Company (BBC) - Research and Development; Queen Mary University of London - Centre for Digital Music

Tristan Ferne

British Broadcasting Company (BBC) - Research and Development

Michael Smethurst

British Broadcasting Company (BBC) - Research and Development

Gareth Adams

British Broadcasting Company (BBC) - Research and Development

Abstract

Most broadcasters have accumulated large audio and video archives stretching back over many decades. For example the BBC World Service radio archive includes around 70,000 English-language programmes from over 45 years. This amounts to about three years of continuous audio and around 15TB of data. The metadata around this archive is sparse and sometimes wrong, but the full audio content is available in digital form. We have built a system to process the existing audio and text and automatically annotate programmes within the archive with Linked Data web identifiers. The resulting interlinks are used to bootstrap search and navigation within this archive and expose it to users. Automated data will never be entirely accurate so we built crowdsourcing mechanisms for users to correct and add data. The resulting crowdsourced data is then used to improve search and navigation within the archive, as well as evaluate and improve our algorithms. As a result of this feedback cycle, the interlinks between our archive and the Semantic Web are continuously improving. This unique combination of Semantic Web technologies, automation and crowdsourcing has dramatically reduced the amount of time and eort required to publish this rich archive online. The BBCWorld Service archive prototype is available online at http://worldservice.prototyping.bbc.co.uk, last accessed March 2014

Keywords: Crowdsourcing, Semantic Web, Automated tagging, Speaker identification, Interlinking, Archives, BBC

Suggested Citation

Raimond, Yves and Ferne, Tristan and Smethurst, Michael and Adams, Gareth, The BBC World Service Archive Prototype (2014). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3199103 or http://dx.doi.org/10.2139/ssrn.3199103

Yves Raimond (Contact Author)

British Broadcasting Company (BBC) - Research and Development ( email )

London
United Kingdom

Queen Mary University of London - Centre for Digital Music ( email )

Mile End Rd
Mile End Road
London, E1 4NS
United Kingdom

Tristan Ferne

British Broadcasting Company (BBC) - Research and Development ( email )

London
United Kingdom

Michael Smethurst

British Broadcasting Company (BBC) - Research and Development ( email )

London
United Kingdom

Gareth Adams

British Broadcasting Company (BBC) - Research and Development ( email )

London
United Kingdom

Register to save articles to
your library

Register

Paper statistics

Abstract Views
143
Downloads
4