Machine Learning for Instrumental Variable Regression: From Bias to Resilience

Peng, Jing

doi:10.2139/ssrn.5008641

Download This Paper

Open PDF in Browser

Add Paper to My Library

Machine Learning for Instrumental Variable Regression: From Bias to Resilience

45 Pages Posted: 8 Jan 2025

See all articles by Jing Peng

Jing Peng

University of Connecticut - Department of Operations & Information Management

Date Written: November 04, 2024

Abstract

The application of machine learning (ML) in causal inference has attracted significant attention from researchers. A particular focus lies in the integration of ML into two-stage least squares (2SLS), a cornerstone methodology for causal inference. While ML can improve the efficiency of 2SLS by reducing prediction error in the first stage, a major hurdle arises due to the concept of forbidden regression. Specifically, a nonlinear first stage is commonly deemed forbidden because the potential lack of orthogonality between the prediction and prediction error may lead to inconsistent estimates. To provide generalizable insights into the applicability of ML in the first stage of 2SLS, this paper decomposes the bias of a generalized 2SLS estimator into an observable bias and an unobservable bias, without specifying the functional form of the first stage or assuming the proposed instrument to be truly exogenous. Analytical results and extensive simulations show that while a linear prediction can ensure a zero observable bias, it may result in a substantial unobservable bias, especially when the instrument is weak or not strictly exogenous. Conversely, with constrained or orthogonalized ML predictions, it is possible, and even guaranteed under certain conditions, to reduce the unobservable bias without introducing an observable bias. By deriving the expression of bias under minimal assumptions, this paper identifies the sufficient and practically necessary condition for the consistency of ML-augmented 2SLS and offers valuable and previously unexplored insights into its resilience to imperfect instruments, establishing crucial theoretical foundations for the integration of ML into instrumental variable regression.

Keywords: machine learning, causal inference, 2SLS, bias decomposition, endogeneity decomposition, imperfect instruments

Suggested Citation: Suggested Citation

Peng, Jing, Machine Learning for Instrumental Variable Regression: From Bias to Resilience (November 04, 2024). Available at SSRN: https://ssrn.com/abstract=5008641 or http://dx.doi.org/10.2139/ssrn.5008641