header

Querying a Messy Web of Data with Avalanche

39 Pages Posted: 10 Jul 2018 First Look: Accepted

See all articles by Cosmin Basca

Cosmin Basca

University of Zurich - Dynamic and Distributed Information Systems Group

Abraham Bernstein

University of Zurich - Dynamic and Distributed Information Systems Group

Abstract

Recent efforts have enabled applications to query the entire Semantic Web. Such approaches are either based on a centralised store or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches make additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages.

In this article we propose a technique called Avalanche, designed for querying the SemanticWeb without making any prior assumptions about the data location or distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche finds up-to-date answers to queries over SPARQL endpoints. It first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers.

We empirically evaluate Avalanche using the realistic FedBench data-set over 26 servers and investigate its behaviour for varying degrees of instance-level distribution "messiness" using the LUBM synthetic dataset spread over 100 servers. Results show that Avalanche is robust and stable in spite of varying network latency finding first results for 80% of the queries in under one second. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution.

Keywords: federated SPARQL, RDF distribution messines, query planing, adaptive querying, changing network conditions

Suggested Citation

Basca, Cosmin and Bernstein, Abraham, Querying a Messy Web of Data with Avalanche (2014). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3199098 or http://dx.doi.org/10.2139/ssrn.3199098

Cosmin Basca (Contact Author)

University of Zurich - Dynamic and Distributed Information Systems Group ( email )

Plattenstrasse 14
Zurich
Switzerland

Abraham Bernstein

University of Zurich - Dynamic and Distributed Information Systems Group ( email )

Plattenstrasse 14
Zurich
Switzerland

Register to save articles to
your library

Register

Paper statistics

Abstract Views
127
Downloads
2