header

Triple Storage for Random-Access Versioned Querying of RDF Archives

35 Pages Posted: 12 Sep 2018 First Look: Accepted

See all articles by Ruben Taelman

Ruben Taelman

Ghent University-Universiteit Gent - IDLab

Miel Vander Sande

Ghent University-Universiteit Gent - IDLab

Joachim Van Herwegen

Ghent University-Universiteit Gent - IDLab

Erik Mannens

Ghent University-Universiteit Gent - IDLab

Ruben Verborgh

Ghent University-Universiteit Gent - IDLab

Abstract

When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new tradeoff regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.

Keywords: Linked Data, RDF archiving, Semantic Data Versioning, storage, indexing

Suggested Citation

Taelman, Ruben and Sande, Miel Vander and Herwegen, Joachim Van and Mannens, Erik and Verborgh, Ruben, Triple Storage for Random-Access Versioned Querying of RDF Archives (September 12, 2018). Journal of Web Semantics First Look . Available at SSRN: https://ssrn.com/abstract=3248501 or http://dx.doi.org/10.2139/ssrn.3248501

Ruben Taelman (Contact Author)

Ghent University-Universiteit Gent - IDLab

Gent
Belgium

Miel Vander Sande

Ghent University-Universiteit Gent - IDLab

Gent
Belgium

Joachim Van Herwegen

Ghent University-Universiteit Gent - IDLab

Gent
Belgium

Erik Mannens

Ghent University-Universiteit Gent - IDLab

Gent
Belgium

Ruben Verborgh

Ghent University-Universiteit Gent - IDLab ( email )

Gent
Belgium

Register to save articles to
your library

Register

Paper statistics

Abstract Views
131
PlumX Metrics
Downloads
13
!

Under construction: SSRN citations will be offline until July when we will launch a brand new and improved citations service, check here for more details.

For more information