Differential Privacy for Social Science Inference

44 Pages Posted: 19 Oct 2015

See all articles by Vito D'Orazio

Vito D'Orazio

University of Texas at Dallas

James Honaker

Pennsylvania State University

Gary King

Harvard University

Date Written: July 24, 2015

Abstract

Social scientists often want to analyze data that contains sensitive personal information that must remain private. However, common techniques for data sharing that attempt to preserve privacy either bring great privacy risks or great loss of information. A long literature has shown that anonymization techniques for data releases are generally open to reidentification attacks. Aggregated information can reduce but not prevent this risk, while also reducing the utility of the data to researchers. Even publishing statistical estimates without releasing the data cannot guarantee that no sensitive personal information has been leaked. Differential Privacy, deriving from roots in cryptography, is one formal, mathematical conception of privacy preservation. It brings provable guarantees that any reported result does not reveal information about any one single individual. In this paper we detail the construction of a secure curator interface, by which researchers can have access to privatized statistical results from their queries without gaining any access to the underlying raw data. We introduce differential privacy and the construction of differentially private summary statistics. We then present new algorithms for releasing differentially private estimates of causal effects and the generation of differentially private covariance matrices from which any least squares regression may be estimated. We demonstrate the application of these methods through our curator interface.

Suggested Citation

D'Orazio, Vito and Honaker, James and King, Gary, Differential Privacy for Social Science Inference (July 24, 2015). Sloan Foundation Economics Research Paper No. 2676160, Available at SSRN: https://ssrn.com/abstract=2676160 or http://dx.doi.org/10.2139/ssrn.2676160

Vito D'Orazio

University of Texas at Dallas ( email )

School of Economic, Political, and Policy Sciences
800 West Campbell Rd
Richardson, TX Richardson 75080
United States

James Honaker

Pennsylvania State University ( email )

University Park
State College, PA 16802
United States

Gary King (Contact Author)

Harvard University ( email )

1737 Cambridge St.
Institute for Quantitative Social Science
Cambridge, MA 02138
United States
617-500-7570 (Phone)

HOME PAGE: http://gking.harvard.edu

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
174
Abstract Views
1,172
rank
238,825
PlumX Metrics