Data Protection's Composition Problem

European Data Protection Law Review (EDPL), Vol. 5, Iss. 3 (2019) (Forthcoming)

8 Pages Posted: 29 Sep 2019

See all articles by Aaron Fluitt

Aaron Fluitt

Institute for Technology Law & Policy

Aloni Cohen

Boston University - Hariri Institute for Computing, School of Law

Micah Altman

Massachusetts Institute of Technology (MIT) Libraries; The Brookings Institution

Kobbi Nissim

Georgetown University - Department of Computer Science

Salome Viljoen

Harvard University

Alexandra Wood

Harvard University - Berkman Klein Center for Internet & Society

Date Written: September 9, 2019

Abstract

Is it possible to piece together the confidential data of almost everyone in the US from statistics published by the Census Bureau—without breaching Census security or policy? Could someone—a doctor, a nosy neighbor, or a foreign state actor—determine whether a particular person participated in a genetic study of hundreds of individuals, when each individual contributed only tiny trace amounts of DNA to a highly complex and aggregated genetic mixture? Could police detectives re-trace a suspect’s every movement over the course of many months and thereby learn intimate details about the suspect’s political, religious, and sexual associations—without having to deploy any sort of surveillance or tracking devices? Could someone reliably deduce the sexual preferences of a Facebook user without looking at any content that user has shared?

Until recently, most people probably never imagined that their highly sensitive personal data could be so vulnerable to discovery from seemingly innocuous sources. Many continue to believe that the privacy risks from purely public, statistical, and anonymised data are merely theoretical, and that the practical risks are negligibly small. Yet all of the privacy violations described above are not only theoretically possible—they have already been successfully executed.

The foregoing examples of real-world privacy attacks all leverage one particular vulnerability that we refer to as composition effects. This vulnerability stems from the cumulative erosions of privacy that inhere in every piece of data about people. These erosions occur no matter how aggregated, insignificant, or anonymised the data may seem, and even small erosions can combine in unanticipated ways to create big risks.

Privacy and data protection failures from unanticipated composition effects reflect a type of data myopia—a short-sighted approach toward addressing increasingly-ubiquitous surveillance and privacy risks from Big Data analytics, characterized by a near-total focus on individual data processors and processes and by pervasive underestimation of systemic risks accumulating from independent data products. The failure to recognize accumulation of risk in the information ecosystem reflects a more general societal blind spot to cumulative systemic risks, with parallels in collective failures to foresee or forestall global financial crises, and to adequately address mounting risks to the natural environment.

As the volume and complexity of data uses and publications grow rapidly across a broad range of contexts, the need to develop frameworks for addressing cumulative privacy risks is likely to become an increasingly urgent and widespread problem. Threats to privacy are growing due to the accelerating abundance, and richness, of data about individuals being generated and made publicly available. Furthermore, substantial increases in computing power and algorithmic improvements are making the execution of such attacks more technically feasible. These threats will be impossible to overcome unless regulations are designed to explicitly regulate cumulative risk in a manner that is consistent with the science of composition effects.

Keywords: Data Protection, Privacy, Composition, Computer Science, Personal Data, Statistics, Cumulative Effects, Cumulative Impacts

Suggested Citation

Fluitt, J. Aaron and Cohen, Aloni and Altman, Micah and Nissim, Kobbi and Viljoen, Salome and Wood, Alexandra, Data Protection's Composition Problem (September 9, 2019). European Data Protection Law Review (EDPL), Vol. 5, Iss. 3 (2019) (Forthcoming). Available at SSRN: https://ssrn.com/abstract=3450650

J. Aaron Fluitt (Contact Author)

Institute for Technology Law & Policy ( email )

600 New Jersey Avenue, NW
Washington, DC 20001
United States

HOME PAGE: http://www.aaronfluitt.com

Aloni Cohen

Boston University - Hariri Institute for Computing, School of Law ( email )

765 Commonwealth Avenue
Boston, MA 02215
United States

Micah Altman

Massachusetts Institute of Technology (MIT) Libraries ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

HOME PAGE: http://informatics.mit.edu

The Brookings Institution ( email )

1775 Massachusetts Ave, NW
Washington, DC 20036
United States

HOME PAGE: http://informatics.mit.edu

Kobbi Nissim

Georgetown University - Department of Computer Science ( email )

37th & O St., NW
St. Mary's Hall 329A
Washington, DC 20057
United States

Salome Viljoen

Harvard University ( email )

1875 Cambridge Street
Cambridge, MA 02138
United States

Alexandra Wood

Harvard University - Berkman Klein Center for Internet & Society ( email )

Harvard Law School
23 Everett, 2nd Floor
Cambridge, MA 02138
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
53
Abstract Views
289
rank
386,917
PlumX Metrics