Active Sampling for Class Probability Estimation and Ranking
34 Pages Posted: 13 Oct 2008
Date Written: 2001
Abstract
In many cost-sensitive environments class probability estimates are used by decisionmakers to evaluate the expected utility from a set of alternatives. Supervisedlearning can be used to build class probability estimates; however, it often is verycostly to obtain training data with class labels. Active sampling acquires data incrementally,at each phase identifying especially useful additional data for labeling,and can be used to economize on examples needed for learning. We outline thecritical features for an active sampling approach and present an active samplingmethod for estimating class probabilities and ranking. BOOTSTRAP-LV identifies particularlyinformative new data for learning based on the variance in probability estimates,and by accounting for a particular data item's informative value for therest of the input space. We show empirically that the method reduces the numberof data items that must be obtained and labeled, across a wide variety of domains.We investigate the contribution of the components of the algorithm and show thateach provides valuable information to help identify informative examples. We alsocompare BOOTSTRAP-LV with UNCERTAINTY SAMPLING,a n existing active samplingmethod designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain class probability estimation accuracyand provide insights on the behavior of the algorithms. Finally, to further ourunderstanding of the contributions made by the elements of BOOTSTRAP-LV, we experimentwith a new active sampling algorithm drawing from both UNCERTAINIYSAMPLING and BOOTSTRAP-LV and show that it is significantly more competitivewith BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggestsmore general implications for improving existing active sampling algorithms forclassification.
Keywords: active learning, class probability estimation, cost-sensitive learning
Suggested Citation: Suggested Citation
Do you have a job opening that you would like to promote on SSRN?
Recommended Papers
-
Active Feature-Value Acquisition
By Maytal Saar-tsechansky, Prem Melville, ...
-
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
By Victor Sheng, Foster Provost, ...
-
Repeated Labeling Using Multiple Noisy Labelers
By Panagiotis G. Ipeirotis, Foster Provost, ...