header

Binary RDF Representation for Publication and Exchange (HDT)

25 Pages Posted: 24 Jun 2018 First Look: Accepted

See all articles by Javier D. Fernandez

Javier D. Fernandez

University of Valladolid - DataWeb Research

Miguel A. Martínez-Prieto

University of Valladolid - DataWeb Research

Claudio Gutiérrez

University of Chile - Department of Computer Science

Axel Polleres

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

Mario Arias

University of Valladolid - DataWeb Research

Abstract

The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets. This scenario calls for efficient formats for publication and exchange.

This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF datasets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that datasets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent publication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.

Keywords: RDF, Binary formats, Data compaction and compression, RDF metrics

Suggested Citation

Fernandez, Javier D. and Martínez-Prieto, Miguel A. and Gutiérrez, Claudio and Polleres, Axel and Arias, Mario, Binary RDF Representation for Publication and Exchange (HDT) (2013). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3198999 or http://dx.doi.org/10.2139/ssrn.3198999

Javier D. Fernandez (Contact Author)

University of Valladolid - DataWeb Research ( email )

Valladolid, Valladolid
Spain

Miguel A. Martínez-Prieto

University of Valladolid - DataWeb Research ( email )

Valladolid, Valladolid
Spain

Claudio Gutiérrez

University of Chile - Department of Computer Science ( email )

Avenida Blanco Encalada
Santiago
Chile

Axel Polleres

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Mario Arias

University of Valladolid - DataWeb Research ( email )

Valladolid, Valladolid
Spain

Register to save articles to
your library

Register

Paper statistics

Abstract Views
124
Downloads
4