Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

96 Pages Posted: 17 Dec 2020 Last revised: 27 Feb 2023

See all articles by Boxiao Chen

Boxiao Chen

University of Illinois at Chicago - College of Business Administration

Yining Wang

University of Texas at Dallas

Yuan Zhou

Tsinghua University - Yau Mathematical Sciences Center

Date Written: December 16, 2020

Abstract

We study the fundamental model in joint pricing and inventory replenishment control under the learning-while-doing framework, with T consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-up-to level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows:

1. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search methods that attain an O(T^{1/2}) regret, assuming the reward function is concave and only twice continuously differentiable.

2. In the more general case of non-concave reward functions, we design an active tournament elimination method that attains O(T^{3/5}) regret, based also on the technical innovation of consistent estimates of reward differences at two prices.

3. We complement the O(T^{3/5}) regret upper bound with a matching \Omega(T^{3/5}) regret lower bound. The lower bound is established by a novel information-theoretical argument based on generalized squared Hellinger distance, which is significantly different from conventional arguments that are based on Kullback-Leibler divergence. This lower bound shows that no learning-while-doing algorithm could achieve O(T^{1/2}) regret without assuming the reward function is concave, even if the sales revenue as a function of demand rate or price is concave.

Both the upper bound technique based on the "difference estimator" and the lower bound technique based on generalized Hellinger distance are new in the literature, and can be potentially applied to solve other inventory or censored demand type problems that involve learning.

Keywords: dynamic pricing, inventory replenishment, censored demand, lost sales, regret minimization, bandit learning, non-concavity

Suggested Citation

Chen, Boxiao and Wang, Yining and Zhou, Yuan, Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands (December 16, 2020). Available at SSRN: https://ssrn.com/abstract=3750413 or http://dx.doi.org/10.2139/ssrn.3750413

Boxiao Chen (Contact Author)

University of Illinois at Chicago - College of Business Administration ( email )

601 S Morgan St
Chicago, IL 60607
United States

Yining Wang

University of Texas at Dallas ( email )

2601 North Floyd Road
Richardson, TX 75083
United States

Yuan Zhou

Tsinghua University - Yau Mathematical Sciences Center ( email )

Beijing
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
602
Abstract Views
2,032
Rank
75,458
PlumX Metrics