We Just Ran Twenty-Three Million Queries of the World Bank's Website
22 Pages Posted: 25 Jun 2014 Last revised: 4 Jul 2014
Date Written: April 28, 2014
Much of the data underlying global poverty and inequality estimates is not in the public domain, but can be accessed in small pieces using the World Bank’s PovcalNet online tool. To overcome these limitations and reproduce this database in a format more useful to researchers, we ran approximately 23 million queries of the World Bank’s web site, accessing only information that was already in the public domain. This web scraping exercise produced 10,000 points on the cumulative distribution of income or consumption from each of 942 surveys spanning 127 countries over the period 1977 to 2012. This short note describes our methodology, briefly discusses some of the relevant intellectual property issues, and illustrates the kind of calculations that are facilitated by this data set, including growth incidence curves and poverty rates using alternative PPP indices.
Keywords: poverty, inequality, consumption, income distribution, open data
JEL Classification: D31, I32, O57
Suggested Citation: Suggested Citation