A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U.S. Government

34 Pages Posted: 2 Aug 2017

See all articles by Andrés Barrientos

Andrés Barrientos

Duke University - Department of Statistical Science

Alexander Bolton

Emory University

Tom Balmat

Duke University - Social Science Research Institute

Jerome Reiter

Duke University

John M. de Figueiredo

Duke University School of Law; Duke University - Fuqua School of Business; National Bureau of Economic Research (NBER); Duke Innovation & Entrepreneurship Initiative

Ashwin Machanavajjhala

Duke University - Department of Computer Science

Yan Chen

Duke University - Department of Computer Science

Charles Kneifel

Duke University - Office of Information Technology

Mark DeLong

Duke University - Office of Information Technology

Date Written: June 1, 2017

Abstract

Data stewards seeking to provide access to large-scale social science data face a difficult challenge. They have to share data in ways that protect privacy and confidentiality, are informative for many analyses and purposes, and are relatively straightforward to use by data analysts. We present a framework for addressing this challenge. The framework uses an integrated system that includes fully synthetic data intended for wide access, coupled with means for approved users to access the confidential data via secure remote access solutions, glued together by verification servers that allow users to assess the quality of their analyses with the synthetic data. We apply this framework to data on the careers of employees of the U. S. federal government, studying differentials in pay by race. The integrated system performs as intended, allowing users to explore the synthetic data for potential pay differentials and learn through verifications which findings in the synthetic data hold up in the confidential data and which do not. We find differentials across races; for example, the gap between black and white female federal employees' pay increased over the time period. We present models for generating synthetic careers and differentially private algorithms for verification of regression results.

Keywords: confidential data, earnings gap, race, U.S. government

JEL Classification: C51, C53, C55, C81, J15, J45

Suggested Citation

Barrientos, Andrés and Bolton, Alexander and Balmat, Tom and Reiter, Jerome and de Figueiredo, John M. and Machanavajjhala, Ashwin and Chen, Yan and Kneifel, Charles and DeLong, Mark, A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U.S. Government (June 1, 2017). Duke I&E Research Paper No. 2017-15; Duke Law School Public Law & Legal Theory Series No. 2017-55. Available at SSRN: https://ssrn.com/abstract=3012163 or http://dx.doi.org/10.2139/ssrn.3012163

Andrés Barrientos

Duke University - Department of Statistical Science ( email )

Box 90251
Durham, NC 27708-0251
United States

Alexander Bolton

Emory University ( email )

1555 Dickey Drive
327 Tarbutton Hall
Atlanta, GA 30322
United States

Tom Balmat

Duke University - Social Science Research Institute ( email )

Campus Box 90989
Durham, NC 27708
United States

Jerome Reiter (Contact Author)

Duke University ( email )

100 Fuqua Drive
Durham, NC 27708-0204
United States

John M. De Figueiredo

Duke University School of Law ( email )

210 Science Drive
Box 90362
Durham, NC 27708
United States

Duke University - Fuqua School of Business ( email )

Box 90120
Durham, NC 27708-0120
United States

National Bureau of Economic Research (NBER)

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Duke Innovation & Entrepreneurship Initiative

215 Morris St., Suite 300
Durham, NC 27701
United States

Ashwin Machanavajjhala

Duke University - Department of Computer Science ( email )

100 Fuqua Drive
Durham, NC 27708-0204
United States

Yan Chen

Duke University - Department of Computer Science ( email )

100 Fuqua Drive
Durham, NC 27708-0204
United States

Charles Kneifel

Duke University - Office of Information Technology ( email )

100 Fuqua Drive
Durham, NC 27708-0204
United States

Mark DeLong

Duke University - Office of Information Technology

100 Fuqua Drive
Durham, NC 27708-0204
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
25
Abstract Views
414
PlumX Metrics