Implications of Data Anonymization on the Statistical Evidence of Disparity

32 Pages Posted:

See all articles by Heng Xu

Heng Xu

American University - Kogod School of Business

Nan Zhang

American University - Kogod School of Business

Date Written: July 28, 2020

Abstract

Research and practical development of data anonymization techniques has proliferated in recent years. Yet limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged sub-populations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between sub-populations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact.

Keywords: privacy, data anonymization, discrimination, statistical disparity

JEL Classification: J71, K31, K13

Suggested Citation

Xu, Heng and Zhang, Nan, Implications of Data Anonymization on the Statistical Evidence of Disparity (July 28, 2020). Available at SSRN: https://ssrn.com/abstract=

Heng Xu

American University - Kogod School of Business

4400 Massachusetts Avenue NW
Washington, DC 20816-8044
United States

Nan Zhang (Contact Author)

American University - Kogod School of Business ( email )

4400 Massachusetts Avenue NW
Washington, DC 20816-8044
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
1
Abstract Views
8
PlumX Metrics