35 Pages Posted: 13 Sep 2018
Date Written: September 13, 2018
Consider a decision maker who has to choose one of several alternatives, and who is imperfectly informed about the payoff of each of them. In each period, the decision maker has to decide whether to stop and take one of the alternatives, or to continue researching the alternatives. New information is costly and is never conclusive. We provide a dynamic programming formulation of the decision maker’s problem with either a finite deadline or no deadline, and give necessary and sufficient conditions for research to take place for some prior beliefs about the alternatives. We show that, at least for short deadlines, the decision maker either explores the best alternative and stops after good news, or explores the second best alternative and stops after bad news, with the former path being optimal if the decision maker is relatively optimistic about the payoff of the alternatives.
Keywords: optimal sequencing of experimentation, multi-armed bandit problem, bandits, Pandora's Box, Sequential sampling
JEL Classification: C41, C61
Suggested Citation: Suggested Citation