Tragedy of the Data Commons
67 Pages Posted: 19 Mar 2011 Last revised: 22 Feb 2012
Date Written: March 18, 2011
Abstract
Accurate data is vital to enlightened research and policymaking, particularly publicly available data that are redacted to protect the identity of individuals. Legal academics, however, are campaigning against data anonymization as a means to protect privacy, contending that wealth of information available on the Internet enables malfeasors to reverse-engineer the data and identify individuals within them. Privacy scholars advocate for new legal restrictions on the collection and dissemination of research data. This Article challenges the dominant wisdom, arguing that properly de-identified data is not only safe, but of extraordinary social utility. It makes three core claims. First, legal scholars have misinterpreted the relevant literature from computer science and statistics, and thus have significantly overstated the futility of anonymizing data. Second, the available evidence demonstrates that the risks from anonymized data are theoretical - they rarely, if ever, materialize. Finally, anonymized data is crucial to beneficial social research, and constitutes a public resource - a commons - under threat of depletion. The Article concludes with a radical proposal: since current privacy policies overtax valuable research without reducing any realistic risks, law should provide a safe harbor for the dissemination of research data.
Keywords: privacy, data privacy, data, anonymization, anonymity
Suggested Citation: Suggested Citation