Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities

39 Pages Posted: 28 Sep 2016 Last revised: 27 May 2017

See all articles by Nathaniel Porter

Nathaniel Porter

Pennsylvania State University

Ashton Verdery

Pennsylvania State University

S. Michael Gaddis

University of California, Los Angeles (UCLA) - Department of Sociology; University of California, Los Angeles (UCLA) - California Center for Population Research

Date Written: May 1, 2017

Abstract

The importance of big data is a contested topic among social scientists. Proponents claim it will fuel a research revolution, but skeptics challenge it as unreliably measured and decontextualized, with limited utility for accurately answering social science research questions. We argue that social scientists need effective tools to quantify big data’s measurement error and expand the contextual information associated with it. Standard research efforts in many fields already pursue these goals through data augmentation, the systematic assessment of measurement against known quantities and expansion of extant data by adding new information. Traditionally, these tasks are accomplished using trained research assistants or specialized algorithms. However, such approaches may not be scalable to big data or appease its skeptics. We consider a third alternative that may increase the validity and value of big data: data augmentation with online crowdsourcing. We present three empirical cases to illustrate the strengths and limits of crowdsourcing for academic research, with a particular eye to how they can be applied to data augmentation tasks that will accelerate acceptance of big data among social scientists. The cases use Amazon Mechanical Turk to (1) verify automated coding of the academic discipline of dissertation committee members, (2) link online product pages to a book database, and (3) gather data on mental health resources at colleges. In light of these cases, we consider the costs and benefits of augmenting big data with crowdsourcing marketplaces and provide guidelines on best practices. We also offer a standardized reporting template that will enhance reproducibility.

Keywords: Big Data, Mechanical Turk, MTurk, Crowdsourcing

JEL Classification: A10, C80

Suggested Citation

Porter, Nathaniel and Verdery, Ashton and Gaddis, S. Michael, Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities (May 1, 2017). Available at SSRN: https://ssrn.com/abstract=2844155 or http://dx.doi.org/10.2139/ssrn.2844155

Nathaniel Porter (Contact Author)

Pennsylvania State University ( email )

University Park
State College, PA 16802
United States

Ashton Verdery

Pennsylvania State University ( email )

University Park
State College, PA 16802
United States

S. Michael Gaddis

University of California, Los Angeles (UCLA) - Department of Sociology ( email )

405 Hilgard Avenue
Box 951361
Los Angeles, CA 90095
United States

University of California, Los Angeles (UCLA) - California Center for Population Research ( email )

337 Charles E Young Dr E
Los Angeles, CA 90095
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
86
Abstract Views
558
rank
290,283
PlumX Metrics