Dynamic Pricing and Learning: An Application of Gaussian Process Regression
35 Pages Posted: 24 Jun 2019
Date Written: June 18, 2019
We consider the problem of offering an optimal price when demand is unknown and must be learned by price experimentation - a variant of the multi-armed bandit problem. In each period, the retailer must decide on a price while using knowledge already acquired to weigh the benefits of exploring other prices versus maximizing revenue. Other algorithms proposed for such problems either learn the price-demand relation for each price separately or determine the parameters of an assumed parametric demand function. We instead combine Gaussian process regression with Thompson sampling as a nonparametric learning algorithm that can learn any functional relation between price and demand. This GP-TS algorithm scales significantly better with the number of price vectors than do existing approaches, yet it makes no restrictive assumptions on the price-demand relationship's functional form. We show how to apply the algorithm to finite inventory settings that consider both single and multiple products and also to settings in which exogenous contextual information can affect prices. One advantage of the algorithm is that its performance depends not on the number of price vectors over all products but only on the number of products considered. For each setting considered here, we benchmark our approach to existing algorithms. The GP-TS algorithm's learning performance is far superior to that of its peers, especially when there are increases in the number of products to learn concurrently. Finally, we propose extensions that enable the application of our algorithm when demand follows a Bernoulli or Poisson distribution.
Keywords: dynamic pricing, revenue management, demand learning, Gaussian processes
Suggested Citation: Suggested Citation