Defining Privacy and Utility in Data Sets
Felix T. Wu
Yeshiva University - Benjamin N. Cardozo School of Law
August 15, 2012
University of Colorado Law Review, Forthcoming
Is it possible to release useful data, while preserving the privacy of the individuals whose information is in the database? This question has been the subject of considerable controversy, particularly in the wake of well-publicized instances in which researchers showed how to re-identify individuals in supposedly anonymous data. Some have argued that privacy and utility are fundamentally incompatible, while others have suggested that simple steps can be taken to achieve both simultaneously. Both sides have looked to the computer science literature for support.
What the existing debate has overlooked, however, is that the relationship between privacy and utility depends crucially on what one means by “privacy” and what one means by “utility.” Apparently contradictory results in the computer science literature can be explained by the use of different definitions to formalize these concepts. Without sufficient attention to these definitional issues, it is all too easy to over-generalize the technical results. More importantly, there are nuances to how definitions of “privacy” and “utility” can differ from each other, nuances that matter for why a definition that is appropriate in one context may not be appropriate in another. Analyzing these nuances exposes the policy choices inherent in the choice of one definition over another, and thereby elucidates decisions about whether and how to regulate data privacy across varying social contexts.
Number of Pages in PDF File: 61
Keywords: information privacy, anonymization, re-identificationAccepted Paper Series
Date posted: April 1, 2012 ; Last revised: July 26, 2013
© 2014 Social Science Electronic Publishing, Inc. All Rights Reserved.
This page was processed by apollo3 in 0.672 seconds