Where A-B Testing Goes Wrong: What Online Experiments Cannot (and Can) Tell You About How Customers Respond to Advertising
59 Pages Posted: 30 Jul 2021 Last revised: 25 May 2023
Date Written: May 25, 2023
Marketers use online advertising platforms to compare user responses to different ad content. However, platforms’ experimentation tools deliver ads to distinct, optimized, undetectable mixes of users that vary across ads, even during the test. As a result, the estimated A-B comparison from the data reflects the combination of ad content and algorithmic selection of users, which is different than what would have occurred under random exposure. We empirically demonstrate this “divergent delivery” pattern using data from an A-B test that we ran on a major ad platform. This paper explains how algorithmic targeting, user heterogeneity, and data aggregation conspire to confound the magnitude, and even the sign, of ad A-B test results, and what the implications are for different roles in the marketing organization with varying experimentation goals. We also consider the counterfactual case of disabling divergent delivery, where user types are balanced across ads. By extending the potential outcomes model of causal inference, we treat random assignment of ads and user exposure to ads as independent decisions. Since not all marketers have the same decision-making goals for these ad A-B tests, we offer prescriptive guidance to experimenters based on their needs.
Keywords: Targeted online advertising, A/B testing, measuring advertising effectiveness, causal inference, experimental design, Simpson's paradox, social media
JEL Classification: C9, M31, M37
Suggested Citation: Suggested Citation