Optimal Allocation Strategies in a Discrete-Time Bandit Problem

Hu, Audrey; Zou, Liang

doi:10.2139/ssrn.5271800

Download This Paper

Open PDF in Browser

Add Paper to My Library

Optimal Allocation Strategies in a Discrete-Time Bandit Problem

Journal of Economic Dynamics & Control

32 Pages Posted: 28 May 2025 Publication Status: Under Review

See all articles by Audrey Hu

Audrey Hu

City University of Hong Kong (CityU) - Department of Economics & Finance

Abstract. We study an exponential bandit model in discrete time, in which an agent must decide how to allocate limited, perfectly divisible resources (e.g., time) per period to achieve a possible breakthrough under uncertainty. Departing from the either-or binary strategies commonly assumed in the literature, we explore continuous allocation strategies using a classical variational approach combined with the principle of optimality for dynamic programming. The solution to the bandit problem is a unique optimal belief-allocation path, characterized by an "Euler-type" recursive transformation and a "transversality condition at infinity." The optimal path reveals two notable features: (i) persistence, where experimentation continues until a breakthrough is achieved, or else never stops, and (ii) adherence to a "Goldilocks principle," whereby the agent's incentives to experiment are maximized at specific task difficulties. Our study demonstrates that when allocations are allowed to take any value in an interval, no binary strategy with a stopping time is optimal for the exponential bandit.

Keywords: optimal resource allocation, belief-allocation path, discrete time, exponential distribution, Goldilocks principle

Suggested Citation: Suggested Citation

Hu, Audrey and Zou, Liang, Optimal Allocation Strategies in a Discrete-Time Bandit Problem. Available at SSRN: https://ssrn.com/abstract=5271800 or http://dx.doi.org/10.2139/ssrn.5271800