Active Feature-Value Acquisition

33 Pages Posted: 16 Nov 2006

See all articles by Maytal Saar-Tsechansky

Maytal Saar-Tsechansky

University of Texas at Austin

Prem melville

IBM Research

Foster Provost

New York University

Date Written: July 18, 2006


Most induction algorithms for building predictive models take as input training data in the form of feature vectors. Acquiring the values of features may be costly, and simply acquiring all values may be wasteful, or even prohibitively expensive. Active feature-value acquisition (AFA) elects features incrementally in an attempt to improve the predictive model most cost-effectively. This paper presents a framework for AFA based on estimating information value. While straightforward in principle, estimations and approximations must be made to apply the framework in practice. We present an acquisition policy, Sampled Expected Utility (SEU), that employs particular estimations to enable effective ranking of potential acquisitions in settings where relatively little information is available about the underlying domain. We then present experimental results showing that, as compared to the policy of using representative sampling for feature acquisition, sampled expected utility indeed reduces the cost of producing a model of a desired accuracy and exhibits consistent performance across domains. We also show that we can improve considerably over a recently published policy for instance completion, a special case of AFA. Finally, we demonstrate additional promise of the expected utility framework by applying it to the even more general modeling setting in which feature values as well as class labels may be missing and are costly to acquire. This is done by treating the class label as an additional feature, thus combining the settings of AFA and traditional active learning.

Keywords: Information acquistion, predicitve modeling

Suggested Citation

Saar-Tsechansky, Maytal and melville, Prem and Provost, Foster, Active Feature-Value Acquisition (July 18, 2006). McCombs Research Paper Series No. IROM-08-06, Available at SSRN: or

Maytal Saar-Tsechansky (Contact Author)

University of Texas at Austin ( email )

Austin, TX 78712
United States

Prem Melville

IBM Research ( email )

T. J. Watson Research Center
1 New Orchard Road
Armonk, NY 10504-1722

Foster Provost

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States