Mean Field Equilibria of Multiarmed Bandit Games

Ramki Gummadi

Stanford University

Ramesh Johari

Stanford University - Management Science & Engineering

Jia Yuan Yu

IBM Research

April 1, 2013

Much of the classical work on algorithms for multiarmed bandits focuses on rewards that are stationary over time. By contrast, we study multiarmed bandit (MAB) games, where the rewards obtained by an agent also depend on how many other agents choose the same arm (as might be the case in many competitive or cooperative scenarios). Such systems are naturally nonstationary due to the interdependent evolution of agents, and in general MAB games can be intractable to analyze using typical equilibrium concepts (such as perfect Bayesian equilibrium).

We introduce a general model of multiarmed bandit games, and study a notion of equilibrium inspired by a large system approximation known as mean field equilibrium. In such an equilibrium, the proportion of agents playing the various arms, called the population profile, is assumed stationary over time; the equilibrium requires a consistency check that this stationary profile arises from the policies chosen by the agents.

We establish three main results in the paper. First, we establish existence of an MFE under general conditions. Second, we show under a contraction condition that the MFE is unique, and that the population profile converges to it from any initial state. Finally, we show that under the contraction condition, MFE is a good approximation to the behavior of finite systems with many agents. The contraction condition requires that the agent population is sufficiently mixing and that the sensitivity of the reward function to the population profile is low enough. Through numerical experiments, we find our main insights appear to hold even when the condition is violated.

Keywords: Multiarmed Bandits

Not Available For Download

Date posted: May 24, 2012 ; Last revised: November 10, 2014

Suggested Citation

Gummadi, Ramki and Johari, Ramesh and Yu, Jia Yuan, Mean Field Equilibria of Multiarmed Bandit Games (April 1, 2013). Available at SSRN: http://ssrn.com/abstract=2045842 or http://dx.doi.org/10.2139/ssrn.2045842

Contact Information

Ramki Gummadi (Contact Author)
Stanford University ( email )
Stanford, CA 94305
United States
Ramesh Johari
Stanford University - Management Science & Engineering ( email )
473 Via Ortega
Stanford, CA 94305-9025
United States
Jia Yuan Yu
IBM Research ( email )
Feedback to SSRN

Paper statistics
Abstract Views: 1,321

© 2015 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollo5 in 0.375 seconds