Calibration of Heterogeneous Treatment Effects in Random Experiments
44 Pages Posted: 8 Jul 2021 Last revised: 9 Sep 2022
Date Written: June 28, 2021
Machine learning is commonly used to estimate the heterogeneous treatment effects (HTEs) in randomized experiments in both academia and industry. Using a large-scale randomized experiment on Facebook, we observe a substantial discrepancy between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment. This paper provides a two-step framework for practitioners and researchers to diagnose and correct this discrepancy. We propose a diagnostic tool to assess whether bias exists in the model-based estimates from machine learning. If bias exists, we provide a model-agnostic method to calibrate any HTE estimates to known, unbiased, subgroup difference-in-means estimates, ensuring that the sign and magnitude of the estimates approximate the model-free benchmarks. Our method requires no additional data beyond the data necessary for estimating HTEs, and it can be scaled to arbitrarily large datasets. We use synthetic simulations to explain two sources of bias: misspecification bias and regularization bias. We illustrate the effectiveness of our calibration method using extensive synthetic simulations and two real-world randomized experiments. We further demonstrate the practical value of this calibration in three general and canonical policy-making settings: in a prescriptive, budget-constrained optimization framework; in a setting seeking to maximize multiple performance indicators; and in a multi-treatment uplift modeling setting.
Keywords: Causal inference, Heterogeneous treatment effects, Random experiments, Calibration, Machine learning
Suggested Citation: Suggested Citation