Multi-Armed Exponential Bandit
36 Pages Posted: 2 Dec 2020 Last revised: 14 Feb 2021
Date Written: November 3, 2020
Exponential bandits are widely adopted in economics and marketing due to their tractability. This paper analyzes the one-agent multi-armed account of exponential bandits, where the agent dynamically selects arms to maximize total payoff. We motivate our base model by examples with arms being of the same type, while the results are generalized to cases where arms are either independent or dependent. The contribution is fourfold. First, we characterize the optimal policy for the agent to choose arms. Under the optimal policy, the agent selects one arm each time, and an arm is used at most once. Second, we show that the agent may not regard information acquisition as a last-ditch effort before quitting, which contradicts the existing literature. Third, with a discount factor, an arm may be used more than once. Fourth, for the case of negatively correlated bandits, the agent may use more than one arms simultaneously. The paper is of both theoretical and practical significance since the model fits well with various situations, including project selection, product promotion, and drug development. Implications for these applications are discussed.
Keywords: multi-armed bandit, experimentation, exponential distribution, information acquisition, project management
Suggested Citation: Suggested Citation