Downloading Wisdom from Online Crowds

54 Pages Posted: 11 Nov 2008  

Albert Saiz

MIT Department of Urban Studies and Planning; IZA Institute of Labor Economics

Uri Simonsohn

University of Pennsylvania - The Wharton School

Multiple version iconThere are 2 versions of this paper

Abstract

The internet and other large textual databases contain billions of documents: is there useful information in the number of documents written about different topics? We propose, based on the premise that the occurrence of a phenomenon increases the likelihood that people write about it, that the relative frequency of documents discussing a phenomenon can be used to proxy for the corresponding occurrence-frequency. After establishing the conditions under which such proxying is likely to be successful, we construct proxies for a number of demographic variables in the US and for corruption across countries and US states and cities, obtaining average correlations with occurrence-frequencies of 0.47 and 0.61 respectively. We also replicate results from two separate published papers establishing the correlates of corruption at both the state and country level. Finally, we construct the first index of corruption in US cities and study its correlates.

Keywords: proxy variables, document-frequency, textual databases, internet

JEL Classification: J11, C81, B40

Suggested Citation

Saiz, Albert and Simonsohn, Uri, Downloading Wisdom from Online Crowds. IZA Discussion Paper No. 3809. Available at SSRN: https://ssrn.com/abstract=1298252 or http://dx.doi.org/10.1111/j.0042-7092.2007.00700.x

Albert Saiz (Contact Author)

MIT Department of Urban Studies and Planning ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States
617-252-1687 (Phone)
617-258-6991 (Fax)

IZA Institute of Labor Economics

P.O. Box 7240
Bonn, D-53072
Germany

Uri Simonsohn

University of Pennsylvania - The Wharton School ( email )

3730 Walnut Street
JMHH 500
Philadelphia, PA 19104-6365
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
121
rank
8,676
Abstract Views
1,097
PlumX