False Discovery in A/B Testing

51 Pages Posted: 30 Nov 2020 Last revised: 15 Mar 2021

See all articles by Ron Berman

Ron Berman

University of Pennsylvania - The Wharton School

Christophe Van den Bulte

University of Pennsylvania - Marketing Department

Date Written: March 15, 2021

Abstract

We investigate what fraction of all significant results in website A/B testing are actually null effects, i.e., the false discovery rate (FDR). Our data consists of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance, and between 18% and 25% for tests at 5% significance (two-sided). These high FDRs stem mostly from the high fraction of true-null effects, about 70%, rather than from low power. Using our estimates we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect 1 in 5 interventions achieving significance at 5% confidence to be ineffective when deployed in the field, and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.

Keywords: A/B Testing, Statistical Power, Experimentation, False Discovery Rate

JEL Classification: C12, C9, L86, M31

Suggested Citation

Berman, Ron and Van den Bulte, Christophe, False Discovery in A/B Testing (March 15, 2021). Available at SSRN: https://ssrn.com/abstract=3718802 or http://dx.doi.org/10.2139/ssrn.3718802

Ron Berman (Contact Author)

University of Pennsylvania - The Wharton School ( email )

3641 Locust Walk
Philadelphia, PA 19104-6365
United States

Christophe Van den Bulte

University of Pennsylvania - Marketing Department ( email )

700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
618
Abstract Views
3,120
Rank
91,714
PlumX Metrics