Dynamic Pricing in an Evolving and Unknown Marketplace

74 Pages Posted: 6 Jun 2019 Last revised: 26 Jul 2021

See all articles by Yiwei Chen

Yiwei Chen

Temple University - Fox School of Business and Management

Zheng Wen

Adobe Research

Yao Xie

Georgia Institute of Technology

Date Written: May 5, 2019


We consider a firm that sells a single type product on multiple local markets over a finite horizon via dynamically adjusted prices. To prevent price discrimination, prices posted on different local markets at the same time are the same. The entire horizon consists of one or multiple change-points. Each local market's demand function linearly evolves over time between any two consecutive change-points. Each change-point is classified as either a zero-order or a first-order change-point in terms of how smooth the demand function changes at this point. At a zero-order change-point, at least one local market's demand function has an abrupt change. At a first-order change-point, all local markets' demand functions continuously evolve over time, but at least one local market's demand evolution speed has an abrupt change. The firm has no information about any parameter that modulates the demand evolution process before the start of the horizon. The firm aims at finding a pricing policy that yields as much revenue as possible. We show that the regret under any pricing policy is lower bounded by CT^{1/2} with C>0, and the lower bound becomes as worse as CT^{2/3} if at least one change-point is a first-order change-point.

We propose a Joint Change-Point Detection and Time-adjusted Upper Confidence Bound (CU) algorithm. This algorithm consists of two components: the change-point detection component and the exploration-exploitation component. In the change-point detection component, the firm uniformly samples each price for one time in each batch of the time interval with the same length. She uses sales data collected at the times that she uniformly samples prices to both detect whether a change occurs and judge whether it is a zero-order or a first-order change if it occurs. In the exploration-exploitation component, the firm implements a time-adjusted upper confidence bound (UCB) algorithm between two consecutive detected change-points. Because demand dynamically evolves between two consecutive change-points, we introduce a time factor into the classical UCB algorithm to correct the bias of using historic sales data to estimate demand at present. We theoretically show that our CU algorithm achieves the regret lower bounds (up to logarithmic factors). Our numerical study shows that our policy performs well in a wide range of market environments.

Keywords: revenue management, dynamic pricing, online learning, multi-armed bandit, change-point detection, asymptotic optimality

Suggested Citation

Chen, Yiwei and Wen, Zheng and Xie, Yao, Dynamic Pricing in an Evolving and Unknown Marketplace (May 5, 2019). Available at SSRN: https://ssrn.com/abstract=3382957 or http://dx.doi.org/10.2139/ssrn.3382957

Yiwei Chen (Contact Author)

Temple University - Fox School of Business and Management ( email )

Philadelphia, PA 19122
United States

Zheng Wen

Adobe Research ( email )

321 Park Avenue
San Jose, CA 95113

Yao Xie

Georgia Institute of Technology ( email )

Atlanta, GA 30332
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics