Content & Structure Coverage: Extracting a Diverse Information Subset
INFORMS Journal on Computing, 2017, 29(4): 660-675
Posted: 17 Oct 2014 Last revised: 13 Aug 2017
Date Written: August 11, 2017
Abstract
Recent years have witnessed a rapid increase of online data volume and the growing challenge of information overload for Web use and applications, thus information diversity is of great importance to both providers and users of search services. Based on the proposed diversity evaluation measure (namely, information coverage), two heuristic methods with corresponding algorithms are designed upon the greedy submodular idea. First, this paper proposes the CovC S-Select algorithm, which possesses the characteristic of asymptotic optimality, to optimize information coverage by applying the strategy of simulated annealing. Then to accelerate the speed of CovC S-Select algorithm, its fast approximation called FastCovC S-Select has been introduced by utilizing the properties of information coverage. Furthermore, ample experiments have been conducted to show the effectiveness, efficiency and parameter robustness of proposed diversity-oriented extraction methods in this paper, along with comparative analyses revealing their performances advantageous over other related methods.
Keywords: Information diversity, Information content coverage, Information structure coverage, Submodularity, Fast approximation
Suggested Citation: Suggested Citation