Robust Control of the Multi-Armed Bandit Problem
University of California, Los Angeles - Anderson School of Management
Aparupa Das Gupta
University of California, Los Angeles (UCLA) - Decisions, Operations, and Technology Management (DOTM) Area
July 1, 2014
We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We characterize the optimal policy as a project-by-project retirement policy but we show that arms become dependent so the Gittins index is not optimal. We propose a Lagrangian index policy that is computationally equivalent to evaluating the indices of a non-robust MAB. For a project selection problem we find that it performs near optimal.
Number of Pages in PDF File: 21
Date posted: September 17, 2013 ; Last revised: July 15, 2014