Dynamic Online Pricing with Incomplete Information Using Multi-Armed Bandit Experiments
61 Pages Posted: 8 Jun 2017 Last revised: 25 Oct 2018
Date Written: February 13, 2018
Pricing managers at online retailers face a unique challenge. They must decide on real-time prices for a large number of products with incomplete demand information. The manager runs price experiments to learn about each product's demand curve and the profit-maximizing price. Balanced field price experiments, in practice can create high opportunity costs since a large number of customers are presented with sub-optimal prices. In this paper, we propose an alternative dynamic price experimentation policy. The proposed approach extends multi-armed bandit (MAB) algorithms, from statistical machine learning, to include microeconomic choice theory. Our automated pricing policy solves this MAB problem using a scalable distribution-free algorithm. We prove analytically that our method is asymptotically optimal for any weakly downward sloping demand curve. In a series of Monte Carlo simulations, we show that the proposed approach perform favorably compared to balanced field experiments and standard methods in dynamic pricing from computer science. In a calibrated simulation based on an existing pricing field experiment, we find that our algorithm can increase profits by 43% profits during the month of testing and 4% annually.
Keywords: dynamic pricing, ecommerce, online experiments, machine learning, multi-armed bandits, partial identification, minimax regret, non-parametric econometrics, A/B testing, field experiments
Suggested Citation: Suggested Citation