Implications of Data Anonymization on the Statistical Evidence of Disparity
32 Pages Posted:
Date Written: July 28, 2020
Research and practical development of data anonymization techniques has proliferated in recent years. Yet limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged sub-populations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between sub-populations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact.
Keywords: privacy, data anonymization, discrimination, statistical disparity
JEL Classification: J71, K31, K13
Suggested Citation: Suggested Citation