Defining Privacy and Utility in Data Sets
61 Pages Posted: 1 Apr 2012 Last revised: 8 Mar 2014
Is it possible to release useful data, while preserving the privacy of the individuals whose information is in the database? This question has been the subject of considerable controversy, particularly in the wake of well-publicized instances in which researchers showed how to re-identify individuals in supposedly anonymous data. Some have argued that privacy and utility are fundamentally incompatible, while others have suggested that simple steps can be taken to achieve both simultaneously. Both sides have looked to the computer science literature for support.
What the existing debate has overlooked, however, is that the relationship between privacy and utility depends crucially on what one means by “privacy” and what one means by “utility.” Apparently contradictory results in the computer science literature can be explained by the use of different definitions to formalize these concepts. Without sufficient attention to these definitional issues, it is all too easy to over-generalize the technical results. More importantly, there are nuances to how definitions of “privacy” and “utility” can differ from each other, nuances that matter for why a definition that is appropriate in one context may not be appropriate in another. Analyzing these nuances exposes the policy choices inherent in the choice of one definition over another, and thereby elucidates decisions about whether and how to regulate data privacy across varying social contexts.
Keywords: information privacy, anonymization, re-identification
Suggested Citation: Suggested Citation