header

SINA: Semantic Interpretation of User Queries for Question Answering on Interlinked Data

18 Pages Posted: 9 Jul 2018 First Look: Accepted

See all articles by Saeedeh Shekarpour

Saeedeh Shekarpour

University of Dayton

Edgard Marx

University of Bonn - Department of Enterprise Information Systems (EIS)

Axel-Cyrille Ngonga Ngomo

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)

Sören Auer

University of Leipzig - Department of Computer Science

Abstract

The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across different datasets or even within the same dataset can be homonyms. Second, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present SINA, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. SINA uses a hidden Markov model to determine the most suitable resources for a user-supplied query from different datasets. Moreover, our framework is able to construct federated queries by using the disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate SINA over three different datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data.

Keywords: Keyword Search, Question Answering, Hidden Markov Model, SPARQL, RDF, Linked Data, Disambiguation

Suggested Citation

Shekarpour, Saeedeh and Marx, Edgard and Ngonga Ngomo, Axel-Cyrille and Auer, Sören, SINA: Semantic Interpretation of User Queries for Question Answering on Interlinked Data (January 2015). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3199174 or http://dx.doi.org/10.2139/ssrn.3199174

Saeedeh Shekarpour (Contact Author)

University of Dayton ( email )

-
-
Dayton, OH 45469-2160
United States

HOME PAGE: http://shekarpour.org/

Edgard Marx

University of Bonn - Department of Enterprise Information Systems (EIS) ( email )

Regina-Pacis-Weg 3
Postfach 2220
Bonn, D-53012
Germany

Axel-Cyrille Ngonga Ngomo

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW) ( email )

Augustusplatz 10/11
Leipzig, 04109
Germany

Sören Auer

University of Leipzig - Department of Computer Science ( email )

Augustusplatz 10/11
Leipzig, 04109
Germany

Register to save articles to
your library

Register

Paper statistics

Abstract Views
178
Downloads
14