Identification and Formal Privacy Guarantees
69 Pages Posted: 17 Jul 2020 Last revised: 4 May 2021
Date Written: April 25, 2021
Empirical economic research crucially relies on highly sensitive individual datasets.
At the same time, increasing availability of public individual-level data that comes from social
networks, public government records and directories makes it possible for adversaries to poten-
tially de-identify anonymized records in sensitive research datasets. Most commonly accepted
formal definition of an individual non-disclosure guarantee is referred to as differential privacy.
With differential privacy in place the researcher interacts with the data by issuing queries that
evaluate the functions of the data. Differential privacy guarantee is achieved by replacing the
actual outcome of the query with a randomized outcome with the amount of randomness deter-
mined by the sensitivity of the outcome to individual observations in the data.
While differential privacy does provide formal non-disclosure guarantees, its impact on the
identication of empirical economic models as well as its impact on the performance of estima-
tors in nonlinear empirical Econometric models has not been suciently studied. Since privacy
protection mechanisms are inherently finite-sample procedures, we define the notion of iden-
tifiability of the parameter of interest under differential privacy as a property of the limit of
experiments. It is naturally characterized by the concepts from the random sets theory and is
linked to the asymptotic behavior in measure of differentially private estimators.
We demonstrate that particular instances of regression discontinuity design may be problem-
atic for inference with differential privacy. Those parameters turn out to be neither point nor
partially identified. The set of differentially private estimators converges weakly to a random
set. This result is clearly supported by our simulation evidence. Our analysis suggests that
many other estimators that rely on nuisance parameters may have similar properties with the
requirement of differential privacy. Identication becomes possible if the target parameter can
be deterministically localized within the random set. In that case, a full exploration of the ran-
dom set of the weak limits of differentially private estimators can allow the data curator to select
a sequence of instances of differentially private estimators that is guaranteed to converge to the
target parameter in probability. We provide a decision-theoretic approach to this selection.
Keywords: Differential privacy, average treatment effect, regression discontinuity,; random sets, identification
JEL Classification: C35, C14, C25, C13
Suggested Citation: Suggested Citation