header

Lightweight Integration of IR & DB for Scalable Hybrid Search with Integrated Ranking Support

34 Pages Posted: 22 Jun 2018 Publication Status: Accepted

See all articles by Haofen Wang

Haofen Wang

Shanghai Jiao Tong University (SJTU); Gowild Robotics Co. Ltd

Thanh Tran

Karlsruhe Institute of Technology - Institute of Applied Informatics and Formal Description Methods (AIFB)

Chang Liu

Shanghai Jiao Tong University (SJTU)

Linyun Fu

Shanghai Jiao Tong University (SJTU)

Abstract

The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both textual and structured data can address more complex information needs. However, hybrid search on the large scale Web environment faces several challenges. First, there is a need for repositories that can store and index a large amount of semantic data as well as textual data in documents, and manage them in an integrated way. Second, methods for hybrid query answering are needed to exploit the data from such an integrated repository. These methods should be fast and scalable, and in particular, they shall support flexible ranking schemes to return not all but only the most relevant results. In this paper, we present CE2, an integrated solution that leverages mature information retrieval and database technologies to support large scale hybrid search. For scalable and integrated management of data, CE2 integrates off-the-shelf database solutions with inverted indexes. Efficient hybrid query processing is supported through novel data structures and algorithms which allow advanced ranking schemes to be tightly integrated. Furthermore, a concrete ranking scheme is proposed to take features from both textual and structured data into account. Experiments conducted on DBpedia and Wikipedia show that CE2 can provide good performance in terms of both effectiveness and efficiency.

Keywords: IR & DB integration, hybrid search, scalable query processing, inverted index, ranking

Suggested Citation

Wang, Haofen and Tran, Thanh and Liu, Chang and Fu, Linyun, Lightweight Integration of IR & DB for Scalable Hybrid Search with Integrated Ranking Support (2011). Journal of Web Semantics First Look, Available at SSRN: https://ssrn.com/abstract=3199538 or http://dx.doi.org/10.2139/ssrn.3199538

Haofen Wang (Contact Author)

Shanghai Jiao Tong University (SJTU) ( email )

KoGuan Law School
Shanghai 200030, Shanghai 200052
China

Gowild Robotics Co. Ltd ( email )

Shenzhen, 518057
China

Thanh Tran

Karlsruhe Institute of Technology - Institute of Applied Informatics and Formal Description Methods (AIFB)

Kaiserstra├če 12
Karlsruhe, Baden W├╝rttemberg 76131
Germany

Chang Liu

Shanghai Jiao Tong University (SJTU)

KoGuan Law School
Shanghai 200030, Shanghai 200052
China

Linyun Fu

Shanghai Jiao Tong University (SJTU)

KoGuan Law School
Shanghai 200030, Shanghai 200052
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
212
Downloads
2
PlumX Metrics