Dynamic Bandits with an Auto-Regressive Temporal Structure

41 Pages Posted: 9 Aug 2021 Last revised: 5 Apr 2023

See all articles by Qinyi Chen

Qinyi Chen

Massachusetts Institute of Technology (MIT) - Operations Research Center

Negin Golrezaei

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Djallel Bouneffouf

IBM Research

Date Written: June 4, 2021

Abstract

Multi-armed bandit (MAB) problems are mainly studied under two extreme settings known as stochastic and adversarial. These two settings, however, do not capture realistic environments such as search engines and marketing and advertising, in which rewards stochastically change in time. Motivated by that, we introduce and study a dynamic MAB problem with stochastic temporal structure, where the expected reward of each arm is governed by an auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple "explore and commit" policies fail, as all arms have to be explored continuously over time. We formalize this by characterizing a per-round regret lower bound, where the regret is measured against a strong (dynamic) benchmark. We then present an algorithm whose per-round regret almost matches our regret lower bound. Our algorithm relies on two mechanisms: (i) alternating between recently pulled arms and unpulled arms with potential, and (ii) restarting. These mechanisms enable the algorithm to dynamically adapt to changes and discard irrelevant past information at a suitable rate. In numerical studies, we further demonstrate the strength of our algorithm under non-stationary settings.

Keywords: dynamic bandits, temporal structures, low-regret policy, online learning algorithms

Suggested Citation

Chen, Qinyi and Golrezaei, Negin and Bouneffouf, Djallel, Dynamic Bandits with an Auto-Regressive Temporal Structure (June 4, 2021). Available at SSRN: https://ssrn.com/abstract=3887608 or http://dx.doi.org/10.2139/ssrn.3887608

Qinyi Chen (Contact Author)

Massachusetts Institute of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E 40-149
Cambridge, MA 02139
United States

Negin Golrezaei

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-416
Cambridge, MA 02142
United States
02141 (Fax)

Djallel Bouneffouf

IBM Research ( email )

T. J. Watson Research Center
1 New Orchard Road
Armonk, NY 10504-1722
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
254
Abstract Views
763
Rank
193,983
PlumX Metrics