header

Creating voiD Descriptions for Web-scale Data

10 Pages Posted: 24 Jun 2018 Publication Status: Accepted

See all articles by Christoph Böhm

Christoph Böhm

University of Potsdam - Hasso Plattner Institute (HPI)

Johannes Lorey

University of Potsdam - Hasso Plattner Institute (HPI)

Felix Naumann

University of Potsdam - Hasso Plattner Institute (HPI)

Abstract

When working with large amounts of crawled semantic data as provided by the Billion Triple Challenge (BTC), it is desirable to present the data in a manner best suited for end users. This includes conceiving and presenting explanatory metainformation. The Vocabulary of Interlinked Data (voiD) has been proposed as a means to annotate sets of RDF resources to facilitate not only human understanding, but also query optimization.

In this article we introduce tools that automatically generate voiD descriptions for large datasets. Our approach comprises different means to identify (sub)datasets and annotate the derived subsets according to the voiD specification. Due to the complexity of Web-scale Linked Data, all algorithms used for partitioning and augmenting are implemented in a cloud environment utilizing the MapReduce paradigm. We employed the Billion Triple Challenge 2010 dataset [6] to evaluate our approach, and present the results in this article. We have released a tool named voiDgen to the public that allows the generation of metainformation for such large datasets.

Keywords: Semantic Web, Vocabulary of Interlinked Data, Semantic Data Profiling, RDF Metadata Generation, Cloud Computing

Suggested Citation

Böhm, Christoph and Lorey, Johannes and Naumann, Felix, Creating voiD Descriptions for Web-scale Data (May 27, 2011). Journal of Web Semantics First Look 9_3_10, Available at SSRN: https://ssrn.com/abstract=3199519 or http://dx.doi.org/10.2139/ssrn.3199519

Christoph Böhm (Contact Author)

University of Potsdam - Hasso Plattner Institute (HPI) ( email )

Prof.-Dr.-Helmert-Str. 2-3,
Potsdam
Germany

Johannes Lorey

University of Potsdam - Hasso Plattner Institute (HPI) ( email )

Prof.-Dr.-Helmert-Str. 2-3,
Potsdam
Germany

Felix Naumann

University of Potsdam - Hasso Plattner Institute (HPI) ( email )

Prof.-Dr.-Helmert-Str. 2-3,
Potsdam
Germany

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
424
Downloads
5
PlumX Metrics