Missing Data as a Causal Inference Problem

Proceedings of the Neural Information Processing Systems Conference (NIPS), 2013, Forthcoming

16 Pages Posted: 26 Oct 2013

See all articles by Karthika Mohan

Karthika Mohan

University of California, Los Angeles (UCLA)

Judea Pearl

University of California, Los Angeles (UCLA) - Computer Science Department

Tian Jin

Iowa State University

Date Written: June 5, 2013

Abstract

We address the problem of deciding whether there exists an unbiased estimator of a given relation Q, when data are missing not at random. We employ a formal representation called "Missingness Graphs" to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we define the notion of recoverability which ensures that, for a given missingness-graph G and a given query Q an algorithm exists that produces an unbiased estimate of Q. That is, in the limit of large samples, the algorithm should produce an estimate of Q as if no data were missing. We further present conditions that the graph should satisfy in order for recoverability to hold and devise algorithms to detect the presence of these conditions.

Suggested Citation

Mohan, Karthika and Pearl, Judea and Jin, Tian, Missing Data as a Causal Inference Problem (June 5, 2013). Proceedings of the Neural Information Processing Systems Conference (NIPS), 2013, Forthcoming, Available at SSRN: https://ssrn.com/abstract=2343794

Karthika Mohan

University of California, Los Angeles (UCLA) ( email )

405 Hilgard Avenue
Box 951361
Los Angeles, CA 90095
United States

Judea Pearl (Contact Author)

University of California, Los Angeles (UCLA) - Computer Science Department ( email )

4732 Boelter Hall
Los Angeles, CA 90095
United States

HOME PAGE: http://www.cs.ucla.edu/~judea/

Tian Jin

Iowa State University ( email )

613 Wallace Road
Ames, IA 50011-2063
United States