Differential Privacy for Social Science Inference
44 Pages Posted: 19 Oct 2015
Date Written: July 24, 2015
Abstract
Social scientists often want to analyze data that contains sensitive personal information that must remain private. However, common techniques for data sharing that attempt to preserve privacy either bring great privacy risks or great loss of information. A long literature has shown that anonymization techniques for data releases are generally open to reidentification attacks. Aggregated information can reduce but not prevent this risk, while also reducing the utility of the data to researchers. Even publishing statistical estimates without releasing the data cannot guarantee that no sensitive personal information has been leaked. Differential Privacy, deriving from roots in cryptography, is one formal, mathematical conception of privacy preservation. It brings provable guarantees that any reported result does not reveal information about any one single individual. In this paper we detail the construction of a secure curator interface, by which researchers can have access to privatized statistical results from their queries without gaining any access to the underlying raw data. We introduce differential privacy and the construction of differentially private summary statistics. We then present new algorithms for releasing differentially private estimates of causal effects and the generation of differentially private covariance matrices from which any least squares regression may be estimated. We demonstrate the application of these methods through our curator interface.
Suggested Citation: Suggested Citation