Dynamic Pricing in an Evolving and Unknown Marketplace
74 Pages Posted: 6 Jun 2019 Last revised: 26 Jul 2021
Date Written: May 5, 2019
Abstract
We consider a firm that sells a single type product on multiple local markets over a finite horizon via dynamically adjusted prices. To prevent price discrimination, prices posted on different local markets at the same time are the same. The entire horizon consists of one or multiple change-points. Each local market's demand function linearly evolves over time between any two consecutive change-points. Each change-point is classified as either a zero-order or a first-order change-point in terms of how smooth the demand function changes at this point. At a zero-order change-point, at least one local market's demand function has an abrupt change. At a first-order change-point, all local markets' demand functions continuously evolve over time, but at least one local market's demand evolution speed has an abrupt change. The firm has no information about any parameter that modulates the demand evolution process before the start of the horizon. The firm aims at finding a pricing policy that yields as much revenue as possible. We show that the regret under any pricing policy is lower bounded by CT^{1/2} with C>0, and the lower bound becomes as worse as CT^{2/3} if at least one change-point is a first-order change-point.
We propose a Joint Change-Point Detection and Time-adjusted Upper Confidence Bound (CU) algorithm. This algorithm consists of two components: the change-point detection component and the exploration-exploitation component. In the change-point detection component, the firm uniformly samples each price for one time in each batch of the time interval with the same length. She uses sales data collected at the times that she uniformly samples prices to both detect whether a change occurs and judge whether it is a zero-order or a first-order change if it occurs. In the exploration-exploitation component, the firm implements a time-adjusted upper confidence bound (UCB) algorithm between two consecutive detected change-points. Because demand dynamically evolves between two consecutive change-points, we introduce a time factor into the classical UCB algorithm to correct the bias of using historic sales data to estimate demand at present. We theoretically show that our CU algorithm achieves the regret lower bounds (up to logarithmic factors). Our numerical study shows that our policy performs well in a wide range of market environments.
Keywords: revenue management, dynamic pricing, online learning, multi-armed bandit, change-point detection, asymptotic optimality
Suggested Citation: Suggested Citation