The Shape of and Solutions to the MTurk Quality Crisis
27 Pages Posted: 3 Dec 2018
Date Written: October 24, 2018
Abstract
Amazon’s Mechanical Turk (MTurk) is widely used to collect affordable and high-quality survey responses. However, researchers recently noticed a substantial decline in data quality, sending shockwaves throughout the social sciences. The problem seems to stem from the use of Virtual Private Servers (VPSs) by respondents outside the U.S. to fool MTurk’s filtering system, but we know relatively little about the cause and consequence of this form of fraud. Analyzing 38 studies conducted on MTurk, we demonstrate that this problem is not new - we find a similar spike in VPS use in 2015. Utilizing two new studies, we show that data from these respondents is of substantially worse quality. Next, we provide two solutions for this problem using an API for an IP traceback application (IP Hub). We provide both a post-hoc method for identifying fraudulent respondents using an original R package (“rIP”) and an associated online application, and an a priori method using JavaScript and PHP code in Qualtrics to block fraudulent respondents from participating. We demonstrate the effectiveness of the screening procedure in a third study. Overall, our results suggest that fraudulent respondents pose a serious threat to data quality but can be easily identified and screened out.
Keywords: Mechanical Turk, MTurk, Fraud, Quality, Online Surveys, Survey Experiments
Suggested Citation: Suggested Citation
Here is the Coronavirus
related research on SSRN
