Anonymization and Risk
59 Pages Posted: 19 Aug 2015 Last revised: 12 Aug 2017
Date Written: August 17, 2015
Abstract
Perfect anonymization of data sets that contain personal information has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the best way to move data release policy past the alleged failures of anonymization is to focus on the process of minimizing risk of reidentification and sensitive attribute disclosure, not preventing harm. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk.
Keywords: Privacy, anonymization, deidentification, data release, risk, data security, open data
Suggested Citation: Suggested Citation