Controlling for Group-Level Heterogeneity in Causal Forest
55 Pages Posted: 20 Aug 2021 Last revised: 9 Nov 2023
Date Written: August 1, 2023
Abstract
Causal forest is a doubly-robust, machine-learning-based estimator that recovers heterogeneous treatment effects and provides more nuanced, complete answers to economic questions than estimators that recover only an average effect. However, using causal forest outside of its original setting---a cross-section of data with random treatment---is challenging because there is no effective way to control for unobservable, group-level commonalities in forest estimations. Group-level commonalities can confound (i.e., bias) treatment effects recovered with causal forest or similar estimators. We provide a solution: estimate group-level effects in a first-stage regression, create a vector of estimated group-level coefficients, and include this vector in a second-stage causal forest estimation. Results from Monte Carlo experiments in simulated data and an application demonstrate our solution's success and the shortcomings of alternatives at recovering both average and heterogeneity in treatment effects. The application compares a series of methods at controlling for unobserved, group-level heterogeneity at the client level when estimating the effectiveness of a sales force visit at increasing sales. Only our proposed method recovers a sales force visit value within the benchmark range recovered via an earlier quasi-random experiment. Our method greatly increases the number of settings in which unbiased, heterogeneous treatment effects are recoverable using tree-based estimators.
Keywords: causal forest, LASSO, fixed effects, unobserved heterogeneity, panel data.
JEL Classification: C10, C14, C31.
Suggested Citation: Suggested Citation