Type S Errors in Multi-Armed Bandits
4 Pages Posted: 7 Oct 2019
Date Written: October 1, 2019
A standard method to evaluate new features and changes to e.g. websites is A/B testing. A common pitfall in performing A/B testing is the habit of looking at a test while it’s running, then stopping early. Due to the implicit multiple testing, the p-values are no longer trustworthy and usually overly optimistic. We investigate the claim that Bayesian methods, unlike frequentist tests, are immune to this “peeking” problem. We demonstrate that two frequently used measures, namely posterior probability and value remaining, are severely affected by repeated testing. We further show a strong dependence on the prior probability of the parameters of interest.
Keywords: multi-armed bandits, sequential testing, A/B testing
JEL Classification: C1, C11, C12
Suggested Citation: Suggested Citation