Abstract

 
 

References (21)



 


 



Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support


Vagelis Hristidis


affiliation not provided to SSRN

Yuheng Hu


affiliation not provided to SSRN

Panagiotis G. Ipeirotis


New York University - Leonard N. Stern School of Business

September 1, 2009

IEEE Transactions on Knowledge and Data Engineering

Abstract:     
Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an information retrieval (IR) ranking function. A naive approach would be to submit a disjunctive query with all query keywords, retrieve all the returned matching documents, and then re-rank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC dataset show that we achieve order of magnitude improvement compared to the current baseline approaches

Number of Pages in PDF File: 14

Keywords: Hidden-web databases, Keyword Search, Top-k ranking

Accepted Paper Series


Download This Paper

Date posted: October 6, 2009 ; Last revised: February 16, 2012

Suggested Citation

Hristidis, Vagelis, Hu, Yuheng and Ipeirotis, Panagiotis G., Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support (September 1, 2009). IEEE Transactions on Knowledge and Data Engineering. Available at SSRN: http://ssrn.com/abstract=1483479

Contact Information

Vagelis Hristidis (Contact Author)
affiliation not provided to SSRN ( email )
No Address Available
Yuheng Hu
affiliation not provided to SSRN ( email )
No Address Available
Panagiotis G. Ipeirotis
New York University - Leonard N. Stern School of Business ( email )
44 West Fourth Street
Ste 8-84
New York, NY 10012
United States
+1-212-998-0803 (Phone)
HOME PAGE: http://www.stern.nyu.edu/~panos
Feedback to SSRN (Beta)


Paper statistics
Abstract Views: 252
Downloads: 18
References:  21

© 2013 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright
This page was processed by apollo6 in 0.359 seconds