Missing Data as a Causal Inference Problem
Proceedings of the Neural Information Processing Systems Conference (NIPS), 2013, Forthcoming
16 Pages Posted: 26 Oct 2013
Date Written: June 5, 2013
We address the problem of deciding whether there exists an unbiased estimator of a given relation Q, when data are missing not at random. We employ a formal representation called "Missingness Graphs" to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we define the notion of recoverability which ensures that, for a given missingness-graph G and a given query Q an algorithm exists that produces an unbiased estimate of Q. That is, in the limit of large samples, the algorithm should produce an estimate of Q as if no data were missing. We further present conditions that the graph should satisfy in order for recoverability to hold and devise algorithms to detect the presence of these conditions.
Suggested Citation: Suggested Citation