Local Asymptotic Normality for Multi-Armed Bandits

13 Pages Posted: 13 Dec 2025

See all articles by Ramon van den Akker

Ramon van den Akker

Tilburg University

Bas J.M. Werker

Tilburg University

Bo Zhou

Virginia Tech Econ Department

Abstract

Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H´ajek-Le Cam, for (contextual) bandits whose arms’ expected payoffs differ by O(T−1/2), is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms’ expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.

Keywords: multi-armed bandit, limit experiment, local asymptotic normality

Suggested Citation

van den Akker, Ramon and Werker, Bas J.M. and Zhou, Bo, Local Asymptotic Normality for Multi-Armed Bandits. Available at SSRN: https://ssrn.com/abstract=5914972 or http://dx.doi.org/10.2139/ssrn.5914972

Ramon Van Den Akker

Tilburg University ( email )

P.O. Box 90153
Tilburg, DC 5000 LE
Netherlands

Bas J.M. Werker

Tilburg University ( email )

P.O. Box 90153
Tilburg, DC 5000 LE
Netherlands

Bo Zhou (Contact Author)

Virginia Tech Econ Department ( email )

3021 Pamplin Hall
Blacksburg, VA 24061
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
24
Abstract Views
185
PlumX Metrics