Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

38 Pages Posted: 13 Mar 2014

See all articles by Xi Chen

Xi Chen

New York University (NYU) - Leonard N. Stern School of Business

Qihang Lin

Carnegie Mellon University - David A. Tepper School of Business

Dengyong Zhou

Microsoft Corporation - Microsoft Research, New York City

Date Written: March 12, 2014

Abstract

In crowd labeling, a large amount of unlabeled data instances are outsourced to a crowd of workers. Workers will be paid for each label they provide, but the labeling requester usually has only a limited amount of the budget. Since data instances have different levels of labeling difficulty and workers have different reliability, it is desirable to have an optimal policy to allocate the budget among all instance-worker pairs such that the overall labeling accuracy is maximized. We consider categorical labeling tasks and formulate the budget allocation problem as a Bayesian Markov decision process (MDP), which simultaneously conducts learning and decision making. Using the dynamic programming (DP) recurrence, one can obtain the optimal allocation policy. However, DP quickly becomes computationally intractable when the size of the problem increases. To solve this challenge, we propose a computationally efficient approximate policy, called optimistic knowledge gradient policy. Our MDP is a quite general framework, which applies to both pull crowdsourcing marketplaces with homogeneous workers and push marketplaces with heterogeneous workers. It can also incorporate the contextual information of instances when they are available. The experiments on both simulated and real data show that the proposed policy achieves a higher labeling accuracy than other existing policies at the same budget level.

Keywords: Crowdsourcing, Crowd labeling, Budget allocation, Markov decision process, Optimistic knowledge gradient

JEL Classification: C44, C11, C61, C63

Suggested Citation

Chen, Xi and Lin, Qihang and Zhou, Dengyong, Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling (March 12, 2014). Available at SSRN: https://ssrn.com/abstract=2408163 or http://dx.doi.org/10.2139/ssrn.2408163

Xi Chen (Contact Author)

New York University (NYU) - Leonard N. Stern School of Business ( email )

44 West 4th Street
Suite 9-160
New York, NY NY 10012
United States

Qihang Lin

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States

Dengyong Zhou

Microsoft Corporation - Microsoft Research, New York City ( email )

641 Avenue of Americas
New York, NY 10011
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
56
Abstract Views
472
rank
375,514
PlumX Metrics