Methods for Collecting Large-Scale Non-Expert Text Coding

49 Pages Posted: 4 May 2013  

Drew Conway

New York University (NYU) - Department of Politics

Date Written: May 3, 2013


The task of coding text for discrete categories or quantifiable scales is a classic problem in political science. Traditionally, this task is executed by qualified “experts.” While productive, this method is time consuming, resource intensive, and introduces bias. In the following paper I present the findings from a series of experiments developed to assess the viability of using crowd-sourcing platforms for political text coding, and how variations in the collection mechanism affects the quality of output. To do this, the labor pool available on Amazon’s Mechanical Turk platform were asked to identify policy statements and positions from a text corpus of party manifestos. To evaluate the quality of the the non-expert codings, this text corpus is also coded by multiple experts for comparison. The evidence from these experiments show that crowd-sourcing is an effective alternative means to generating quantitative categorization from text. The presence of a filter on workers increases the quality of output, but variation on that filter have little affect. The primary weakness of the non-experts participating in these experiments is their systematic inability to identify texts that contain no policy statement.

Keywords: crowd-sourcing, text coding, mechanical turk, comparative manifestos

Suggested Citation

Conway, Andrew, Methods for Collecting Large-Scale Non-Expert Text Coding (May 3, 2013). Available at SSRN: or

Andrew Conway (Contact Author)

New York University (NYU) - Department of Politics ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Register to save articles to
your library


Paper statistics

Abstract Views