|
||||
|
||||
User-Centric Operational Decision-Making in Distributed Information RetrievalKartik HosanagarUniversity of Pennsylvania - Operations & Information Management Department December 1, 2008 Information Systems Research, Forthcoming Abstract: Information specialists in enterprises and consumers on the Internet regularly use Distributed Information Retrieval (DIR) systems that query a large number of Information Retrieval (IR) systems, merge the retrieved results and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, since different servers handle collections of different sizes, have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the distributed IR system thus has to decide which servers to query, how long to wait for responses and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user's cost associated with waiting for the servers to respond. We formulate the broker's decision problem as a stochastic mixed integer program. We present closed-form results for the optimal query set and wait time in the special case when the relevance scores and response times of the IR servers are independent and identically distributed. When servers are heterogeneous, we present a simulations-based optimization technique and demonstrate how the optimal query set and wait time may be determined. The technique is computationally efficient and can be used to generate decision rules for source selection and query termination that are relatively easy to implement. We use data gathered from two different contexts - a DIR system that queries IR engines of several US federal agencies and a comparison shopping engine that queries multiple stores for price and product information - to validate our technique. Our research demonstrates that user satisfaction can be considerably improved by modeling user utility and incorporating historical information on performance of the IR servers.
Number of Pages in PDF File: 43 Keywords: Distributed IR, metasearch, Patent search, Optimal operational decisions, Utility theory, Source selection, Query termination working papers seriesDate posted: August 30, 2006 ; Last revised: May 8, 2012Suggested CitationContact Information
|
|
||||||||||||
© 2013 Social Science Electronic Publishing, Inc. All Rights Reserved.
FAQ
Terms of Use
Privacy Policy
Copyright
This page was processed by apollo5 in 0.703 seconds