header

Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing

33 Pages Posted: 2 Jul 2018 First Look: Accepted

See all articles by Maribel Acosta

Maribel Acosta

Karlsruhe Institute of Technology - Institute AIFB

Elena Simperl

University of Southampton - School of Electronics and Computer Science (ECS)

Fabian Flock

GESIS - Leibniz Institute for the Social Sciences

Maria-Esther Vidal

Universidad Simón Bolívar (USB) - Computer Science Department

Abstract

Linked Open Data initiatives have encouraged the publication of large RDF datasets into the Linking Open Data (LOD) cloud, including DBpedia, YAGO, and Geo-Names. Despite the size of LOD datasets and the development of (semi-)automatic methods to create and link LOD data, these datasets may be still incomplete, negatively affecting thus accuracy of Linked Data processing techniques. We acquire query answer completeness by capturing knowledge collected from the crowd, and propose a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. Our system, HARE, implements these hybrid query processing techniques. HARE encompasses several features: (1) a completeness model for RDF that exploits the characteristics of RDF in order to estimate the completeness of an RDF dataset; (2) a crowd knowledge base that captures crowd answers about missing values in the RDF dataset; (3) a query engine that combines on-the-fly crowd knowledge and estimates provided by the RDF completeness model, to decide upon the sub-queries of a SPARQL query that should be executed against the dataset or via crowd computing to enhance query answer completeness; and (4) a microtask manager that exploits the semantics encoded in the dataset RDF properties, to crowdsource SPARQL subqueries as microtasks and update the crowd knowledge base with the results from the crowd. Effectiveness and efficiency of HARE are empirically studied on a collection of 50 SPARQL queries against the DBpedia dataset. Experimental results clearly show that our solution accurately enhances answer completeness.

Keywords: RDF Data, Crowd Knowledge, Query Execution, Crowdsourcing, Hybrid System, Microtasks, Completeness Model, SPARQL Query

Suggested Citation

Acosta, Maribel and Simperl, Elena and Flock, Fabian and Vidal, Maria-Esther, Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing (August 2017). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3199306 or http://dx.doi.org/10.2139/ssrn.3199306

Maribel Acosta (Contact Author)

Karlsruhe Institute of Technology - Institute AIFB ( email )

Building 05.20 KIT-Campus South
Karlsruhe, D-76128
Germany

Elena Simperl

University of Southampton - School of Electronics and Computer Science (ECS) ( email )

University Road
Southampton
United Kingdom

Fabian Flock

GESIS - Leibniz Institute for the Social Sciences ( email )

Unter Sachsenhausen 6-8, 50667 Köln
Mannheim, 68159
Germany

Maria-Esther Vidal

Universidad Simón Bolívar (USB) - Computer Science Department ( email )

Sartenejas
Caracas
Venezuela

Register to save articles to
your library

Register

Paper statistics

Abstract Views
210
Downloads
11