User-Centric Operational Decision-Making in Distributed Information Retrieval

Information Systems Research, Forthcoming

43 Pages Posted: 30 Aug 2006 Last revised: 25 Feb 2015

See all articles by Kartik Hosanagar

Kartik Hosanagar

University of Pennsylvania - Operations & Information Management Department

Date Written: December 1, 2008

Abstract

Information specialists in enterprises and consumers on the Internet regularly use Distributed Information Retrieval (DIR) systems that query a large number of Information Retrieval (IR) systems, merge the retrieved results and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, since different servers handle collections of different sizes, have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the distributed IR system thus has to decide which servers to query, how long to wait for responses and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user's cost associated with waiting for the servers to respond. We formulate the broker's decision problem as a stochastic mixed integer program. We present closed-form results for the optimal query set and wait time in the special case when the relevance scores and response times of the IR servers are independent and identically distributed. When servers are heterogeneous, we present a simulations-based optimization technique and demonstrate how the optimal query set and wait time may be determined. The technique is computationally efficient and can be used to generate decision rules for source selection and query termination that are relatively easy to implement. We use data gathered from two different contexts - a DIR system that queries IR engines of several US federal agencies and a comparison shopping engine that queries multiple stores for price and product information - to validate our technique. Our research demonstrates that user satisfaction can be considerably improved by modeling user utility and incorporating historical information on performance of the IR servers.

Keywords: Distributed IR, metasearch, Patent search, Optimal operational decisions, Utility theory, Source selection, Query termination

Suggested Citation

Hosanagar, Kartik, User-Centric Operational Decision-Making in Distributed Information Retrieval (December 1, 2008). Information Systems Research, Forthcoming. Available at SSRN: https://ssrn.com/abstract=926928 or http://dx.doi.org/10.2139/ssrn.926928

Kartik Hosanagar (Contact Author)

University of Pennsylvania - Operations & Information Management Department ( email )

Philadelphia, PA 19104
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
204
rank
142,787
Abstract Views
2,997
PlumX Metrics