The A/B Test Deception: Divergent Delivery, Response Heterogeneity, and Erroneous Inferences in Online Advertising Field Experiments
54 Pages Posted: 30 Jul 2021 Last revised: 25 Feb 2022
Date Written: February 24, 2022
Advertisers and researchers use tools provided by advertising platforms to conduct randomized experiments for testing user responses to creative elements in online ads. Internally valid comparisons between ads require the mix of experimental users exposed to each ad to be similar across all ads. But that internal validity is threatened when platforms' targeting algorithms deliver each ad to its own optimized mix of users, which diverges across ads. We extend the potential outcomes model of causal inference to treat random assignment of ads and the user exposure states for each ad as two separate decisions. We then demonstrate how targeting ads to users leads advertisers to incorrectly infer which ad performs better, based on aggregate test results. Through analysis and simulation, we characterize how bias in the aggregate estimate of the difference between two ads' lifts is driven by the interplay between heterogeneous responses to different ads and how platforms deliver ads to divergent subsets of users. We also identify conditions for an undetectable "Simpson's reversal," in which all unobserved types of users may prefer ad A over ad B, but the advertiser mistakenly infers from aggregate experimental results that users prefer ad B over ad A.
Keywords: Targeted online advertising, A/B testing, measuring advertising effectiveness, causal inference, experimental design, Simpson's paradox, social media
JEL Classification: C9, M31, M37
Suggested Citation: Suggested Citation