Local Asymptotic Normality for Multi-Armed Bandits

van den Akker, Ramon; Werker, Bas  J.M.; Zhou, Bo

doi:10.2139/ssrn.5914972

Download This Paper

Open PDF in Browser

Add Paper to My Library

Local Asymptotic Normality for Multi-Armed Bandits

13 Pages Posted: 13 Dec 2025

See all articles by Ramon van den Akker

Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H´ajek-Le Cam, for (contextual) bandits whose arms’ expected payoffs differ by O(T−1/2), is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms’ expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.

Keywords: multi-armed bandit, limit experiment, local asymptotic normality

Suggested Citation: Suggested Citation

van den Akker, Ramon and Werker, Bas J.M. and Zhou, Bo, Local Asymptotic Normality for Multi-Armed Bandits. Available at SSRN: https://ssrn.com/abstract=5914972 or http://dx.doi.org/10.2139/ssrn.5914972