Hedging the Drift: Learning to Optimize Under Non-Stationarity

Forthcoming at Management Science

49 Pages Posted: 30 Oct 2018 Last revised: 18 Mar 2021

See all articles by Wang Chi Cheung

Wang Chi Cheung

Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR)

David Simchi-Levi

Massachusetts Institute of Technology (MIT) - School of Engineering

Ruihao Zhu

Cornell University

Date Written: October 5, 2018

Abstract

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of non-stationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel Bandit-over-Bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multi-armed bandit, generalized linear bandit, and combinatorial semi-bandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the "forgetting principle" in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared to existing algorithms.

Keywords: data-driven decision-making, non-stationary bandit optimization, revenue management

Suggested Citation

Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao, Hedging the Drift: Learning to Optimize Under Non-Stationarity (October 5, 2018). Forthcoming at Management Science, Available at SSRN: https://ssrn.com/abstract=3261050 or http://dx.doi.org/10.2139/ssrn.3261050

Wang Chi Cheung

Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR) ( email )

Singapore

David Simchi-Levi

Massachusetts Institute of Technology (MIT) - School of Engineering ( email )

MA
United States

Ruihao Zhu (Contact Author)

Cornell University ( email )

Ithaca, NY 14853
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
722
Abstract Views
4,569
Rank
65,583
PlumX Metrics