Dynamic Pricing in an Evolving and Unknown Marketplace

63 Pages Posted: 6 Jun 2019

See all articles by Yiwei Chen

Yiwei Chen

University of Cincinnati - Lindner College of Business

Zheng Wen

Adobe Research

Yao Xie

Georgia Institute of Technology

Date Written: May 5, 2019


We consider a firm who sells a single type product on multiple local markets over a finite horizon via dynamically adjusted prices. To prevent price discrimination, prices posted on different local markets at the same time are the same. The entire horizon consists of one or multiple change points. Each local market's demand function linearly evolves over time between any two consecutive change points. Each change point is classified as either a zero-order or a first-order change point in terms of how smooth the demand function changes at this point. At a zero-order change point, at least one local market's demand function has an abrupt change. At a first-order change point, all local markets' demand functions continuously evolve over time, but at least one local market's demand evolution speed has an abrupt change. The firm has no information about any parameter that modulates the demand evolution process before the start of the horizon. The firm aims at finding a pricing policy that yields as much revenue as possible. We show that the regret under any pricing policy is lower bounded by CT^{1/2} with C > 0 and the lower bound becomes as worse as CT^{2/3} if at least one change point is a first-order change point.

We propose a Joint Change-point Detection and Time-adjusted Upper Confidence Bound (CU) algorithm. This algorithm consists of two components: change-point detection component and exploration-exploitation component. In the change-point detection component, the firm uniformly samples each price for one time in each batch of time interval with the same length. She uses sales data collected at the times that she uniformly samples prices to both detect whether a change occurs and judge whether it is a zero-order or a first-order change if it occurs. In the exploration-exploitation component, the firm implements the upper confidence bound (UCB) algorithm between two consecutive detected change points. Because demand evolves linearly in time between two consecutive change points, we introduce a time factor into the UCB algorithm to correct the bias of using historic sales data to estimate demand at the present time. We show that the CU algorithm achieves the regret lower bounds (up to logarithmic factors).

Keywords: revenue management, dynamic pricing, online learning, multi-armed bandit, change-point detection, asymptotic optimality

Suggested Citation

Chen, Yiwei and Wen, Zheng and Xie, Yao, Dynamic Pricing in an Evolving and Unknown Marketplace (May 5, 2019). Available at SSRN: https://ssrn.com/abstract=3382957 or http://dx.doi.org/10.2139/ssrn.3382957

Yiwei Chen (Contact Author)

University of Cincinnati - Lindner College of Business ( email )

P.O. Box 210195
Cincinnati, OH 45221-0195
United States

Zheng Wen

Adobe Research ( email )

321 Park Avenue
San Jose, CA 95113

Yao Xie

Georgia Institute of Technology ( email )

Atlanta, GA 30332
United States

Register to save articles to
your library


Paper statistics

Abstract Views
PlumX Metrics