Optimal Allocation Strategies in a Discrete-Time Bandit Problem
32 Pages Posted: 28 May 2025 Publication Status: Under Review
Abstract
Abstract. We study an exponential bandit model in discrete time, in which an agent must decide how to allocate limited, perfectly divisible resources (e.g., time) per period to achieve a possible breakthrough under uncertainty. Departing from the either-or binary strategies commonly assumed in the literature, we explore continuous allocation strategies using a classical variational approach combined with the principle of optimality for dynamic programming. The solution to the bandit problem is a unique optimal belief-allocation path, characterized by an "Euler-type" recursive transformation and a "transversality condition at infinity." The optimal path reveals two notable features: (i) persistence, where experimentation continues until a breakthrough is achieved, or else never stops, and (ii) adherence to a "Goldilocks principle," whereby the agent's incentives to experiment are maximized at specific task difficulties. Our study demonstrates that when allocations are allowed to take any value in an interval, no binary strategy with a stopping time is optimal for the exponential bandit.
Keywords: optimal resource allocation, belief-allocation path, discrete time, exponential distribution, Goldilocks principle
Suggested Citation: Suggested Citation