Poisson Bandits of Evolving Shade of Gray

44 Pages Posted: 24 Aug 2015

See all articles by Svetlana Boyarchenko

Svetlana Boyarchenko

University of Texas at Austin - Department of Economics

Sergei Levendorskii

Calico Science Consulting

Date Written: August 23, 2015


In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson bandits, it is possible to take an action between two sequential observations. The latter models do not recognize the fact that timing of decisions depends not only on the rate of arrival of observations, but also on the stochastic dynamics of costs or benefits. We combine together these two strands of literature and consider bandits of "evolving shade of gray" instead of two-armed bandits who are either "white knights" or "black villains." Stopping decisions in a model with Poisson bandits of "evolving shade of gray" are qualitatively different from those in optimal stopping or Poisson bandit models. We demonstrate that it may not be optimal to act immediately upon observation even if successes or failures are conclusive.

Keywords: two-armed Poisson bandits, optimal stopping, jump-diffusion processes

JEL Classification: C73, C61, D81

Suggested Citation

Boyarchenko, Svetlana I. and Levendorskii, Sergei Z., Poisson Bandits of Evolving Shade of Gray (August 23, 2015). Available at SSRN: https://ssrn.com/abstract=2649713 or http://dx.doi.org/10.2139/ssrn.2649713

Svetlana I. Boyarchenko (Contact Author)

University of Texas at Austin - Department of Economics ( email )

Austin, TX 78712
United States

Sergei Z. Levendorskii

Calico Science Consulting ( email )

Austin, TX
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics