Poisson Bandits of Evolving Shade of Gray
44 Pages Posted: 24 Aug 2015
Date Written: August 23, 2015
In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson bandits, it is possible to take an action between two sequential observations. The latter models do not recognize the fact that timing of decisions depends not only on the rate of arrival of observations, but also on the stochastic dynamics of costs or benefits. We combine together these two strands of literature and consider bandits of "evolving shade of gray" instead of two-armed bandits who are either "white knights" or "black villains." Stopping decisions in a model with Poisson bandits of "evolving shade of gray" are qualitatively different from those in optimal stopping or Poisson bandit models. We demonstrate that it may not be optimal to act immediately upon observation even if successes or failures are conclusive.
Keywords: two-armed Poisson bandits, optimal stopping, jump-diffusion processes
JEL Classification: C73, C61, D81
Suggested Citation: Suggested Citation