Active Sampling for Class Probability Estimation and Ranking

34 Pages Posted: 13 Oct 2008

See all articles by Maytal Saar-Tsechansky

Maytal Saar-Tsechansky

New York University (NYU)

Foster Provost

New York University (NYU) - Department of Information, Operations, and Management Sciences

Date Written: 2001

Abstract

In many cost-sensitive environments class probability estimates are used by decisionmakers to evaluate the expected utility from a set of alternatives. Supervisedlearning can be used to build class probability estimates; however, it often is verycostly to obtain training data with class labels. Active sampling acquires data incrementally,at each phase identifying especially useful additional data for labeling,and can be used to economize on examples needed for learning. We outline thecritical features for an active sampling approach and present an active samplingmethod for estimating class probabilities and ranking. BOOTSTRAP-LV identifies particularlyinformative new data for learning based on the variance in probability estimates,and by accounting for a particular data item's informative value for therest of the input space. We show empirically that the method reduces the numberof data items that must be obtained and labeled, across a wide variety of domains.We investigate the contribution of the components of the algorithm and show thateach provides valuable information to help identify informative examples. We alsocompare BOOTSTRAP-LV with UNCERTAINTY SAMPLING,a n existing active samplingmethod designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain class probability estimation accuracyand provide insights on the behavior of the algorithms. Finally, to further ourunderstanding of the contributions made by the elements of BOOTSTRAP-LV, we experimentwith a new active sampling algorithm drawing from both UNCERTAINIYSAMPLING and BOOTSTRAP-LV and show that it is significantly more competitivewith BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggestsmore general implications for improving existing active sampling algorithms forclassification.

Keywords: active learning, class probability estimation, cost-sensitive learning

Suggested Citation

Saar-Tsechansky, Maytal and Provost, Foster, Active Sampling for Class Probability Estimation and Ranking (2001). NYU Working Paper No. 2451/14165, Available at SSRN: https://ssrn.com/abstract=1283007

Maytal Saar-Tsechansky

New York University (NYU) ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Foster Provost

New York University (NYU) - Department of Information, Operations, and Management Sciences ( email )

44 West Fourth Street
New York, NY 10012
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
54
Abstract Views
829
Rank
780,783
PlumX Metrics