Abstract

http://ssrn.com/abstract=2045842
 


 



Mean Field Analysis of Multi-Armed Bandit Games


Ramki Gummadi


Stanford University

Ramesh Johari


Stanford University

Sven Schmit


Stanford University

Jia Yuan Yu


IBM Research

April 1, 2013


Abstract:     
Much of the classical work on algorithms for multi-armed bandits focuses on rewards that are stationary over time. By contrast, we study multi-armed bandit (MAB) games, where the rewards obtained by an agent also depend on how many other agents choose the same arm (as might be the case in many competitive or cooperative scenarios). Such systems are naturally nonstationary due to the interdependent evolution of agents, and in general MAB games can be intractable to analyze using typical equilibrium concepts (such as perfect Bayesian equilibrium).

We introduce a general model of multi-armed bandit games, and study the dynamics of these games under a large system approximation. We investigate conditions under which the bandit dynamics have a steady state we refer to as a mean field steady state (MFSS). In an MFSS, the proportion of agents playing the various arms, called the population profile, is assumed stationary over time; the steady state definition then requires a consistency check that this stationary profile arises from the policies chosen by the agents.

We establish the following results in the paper. First, we establish existence of an MFSS under broad conditions. Second, we show under a contraction condition that the MFSS is unique, and that the population profile converges to it from any initial state. Finally, we show that under the contraction condition, MFSS is a good approximation to the behavior of finite systems with many agents. The contraction condition requires that the agent population regenerates sufficiently often, and that the sensitivity of the reward function to the population profile is low enough. Through numerical experiments, we find that in settings with negative externalities among the agents, convergence obtains even when our condition is violated; while in settings with positive externalities among the agents, our condition is tighter.

Number of Pages in PDF File: 28

Keywords: Multiarmed Bandits


Open PDF in Browser Download This Paper

Date posted: May 24, 2012 ; Last revised: August 11, 2016

Suggested Citation

Gummadi, Ramki and Johari, Ramesh and Schmit, Sven and Yu, Jia Yuan, Mean Field Analysis of Multi-Armed Bandit Games (April 1, 2013). Available at SSRN: http://ssrn.com/abstract=2045842 or http://dx.doi.org/10.2139/ssrn.2045842

Contact Information

Ramki Gummadi
Stanford University ( email )
Stanford, CA 94305
United States
Ramesh Johari (Contact Author)
Stanford University ( email )
473 Via Ortega
Stanford, CA 94305-9025
United States
Sven Peter Schmit
Stanford University ( email )
Stanford, CA 94305
United States
Jia Yuan Yu
IBM Research ( email )
Damastown
Dublin
Ireland
Feedback to SSRN


Paper statistics
Abstract Views: 1,440
Downloads: 357
Download Rank: 60,660

© 2016 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollobot1 in 0.203 seconds