Abstract

http://ssrn.com/abstract=1688193
 
 

References (53)



 


 



Repeated Labeling Using Multiple Noisy Labelers


Panagiotis G. Ipeirotis


New York University - Leonard N. Stern School of Business

Foster Provost


New York University

Victor Sheng


affiliation not provided to SSRN

Jing Wang


affiliation not provided to SSRN

October 20, 2012

NYU Working Paper No. CEDER-10-03

Abstract:     
This paper addresses the repeated acquisition of labels for data itemswhen the labeling is imperfect. We examine the improvement (or lackthereof) in data quality via repeated labeling, and focus especially onthe improvement of training labels for supervised induction. With theoutsourcing of small tasks becoming easier, for example via Amazon'sMechanical Turk, it often is possible to obtain less-than-expertlabeling at low cost. With low-cost labeling, preparing the unlabeledpart of the data can become considerably more expensive than labeling.We present repeated-labeling strategies of increasing complexity, andshow several main results. (i) Repeated-labeling can improve labelquality and model quality, but not always. (ii) When labels are noisy,repeated labeling can be preferable to single labeling even in thetraditional setting where labels are not particularly cheap. (iii) Assoon as the cost of processing the unlabeled data is not free, even thesimple strategy of labeling everything multiple times can giveconsiderable advantage. (iv) Repeatedly labeling a carefully chosen setof points is generally preferable, and we present a set of robusttechniques that combine different notions of uncertainty to select datapoints for which quality should be improved. The bottom line: theresults show clearly that when labeling is not perfect, selectiveacquisition of multiple labels is a strategy that data miners shouldhave in their repertoire. For certain label-quality/cost regimes, thebenefit is substantial.

Number of Pages in PDF File: 31

Keywords: active learning, data selection, data preprocessing, classification, crowdsourcing, mechanical turk, noisy data

working papers series


Download This Paper

Date posted: October 6, 2010 ; Last revised: October 21, 2012

Suggested Citation

Ipeirotis, Panagiotis G. and Provost, Foster and Sheng, Victor and Wang, Jing, Repeated Labeling Using Multiple Noisy Labelers (October 20, 2012). NYU Working Paper No. CEDER-10-03. Available at SSRN: http://ssrn.com/abstract=1688193

Contact Information

Panagiotis G. Ipeirotis (Contact Author)
New York University - Leonard N. Stern School of Business ( email )
44 West Fourth Street
Ste 8-84
New York, NY 10012
United States
+1-212-998-0803 (Phone)
HOME PAGE: http://www.stern.nyu.edu/~panos
Foster Provost
New York University ( email )
44 West Fourth Street
New York, NY 10012
United States
Victor Sheng
affiliation not provided to SSRN ( email )
Jing Wang
affiliation not provided to SSRN
Feedback to SSRN


Paper statistics
Abstract Views: 907
Downloads: 180
Download Rank: 96,906
References:  53

© 2014 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollo8 in 0.297 seconds