Causal Inference with Selectively Deconfounded Data

50 Pages Posted: 12 Oct 2021

See all articles by Kyra Gan

Kyra Gan

Carnegie Mellon University

Andrew Li

Carnegie Mellon University

Zachary Chase Lipton

affiliation not provided to SSRN

Sridhar R. Tayur

Carnegie Mellon University - David A. Tepper School of Business

Date Written: October 11, 2021

Abstract

In general, when treatments and effects are observed, but confounders are not, the average treatment effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the ATE. Our theoretical results show that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases---say, genetics---we could imagine retrospectively selecting samples to deconfound. We demonstrate that by actively selecting these samples based upon the (already observed) treatment and outcome, we can reduce our data dependence further. Our theoretical results establish that the worst-case relative performance of our approach (vs. random selection) is bounded while our best-case gains are unbounded. We perform extensive synthetic experiments to validate our theoretical results. Finally, we demonstrate the practical benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer.

Keywords: causal inference, selective deconfounding, optimization, observational study

Suggested Citation

Gan, Kyra and Li, Andrew and Chase Lipton, Zachary and Tayur, Sridhar R., Causal Inference with Selectively Deconfounded Data (October 11, 2021). Available at SSRN: https://ssrn.com/abstract=3940523 or http://dx.doi.org/10.2139/ssrn.3940523

Kyra Gan (Contact Author)

Carnegie Mellon University ( email )

Pittsburgh, PA 15213-3890
United States

Andrew Li

Carnegie Mellon University ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States

Zachary Chase Lipton

affiliation not provided to SSRN

Sridhar R. Tayur

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
73
Abstract Views
261
rank
458,041
PlumX Metrics