Online Learning and Optimization of (Some) Cyclic Pricing Policies in the Presence of Patient Customers
34 Pages Posted: 22 Mar 2018 Last revised: 27 Sep 2018
Date Written: March 19, 2018
We consider the joint learning and optimization problem of cyclic pricing policies in the presence of patient customers. In our problem, some customers can be patient, so that they are willing to wait in the system for several periods to make a purchase until the price is lower than or equal to their valuation. We assume that customers are heterogeneous in both their valuation and patience level. But, the decision maker (i.e., the seller) does not know the joint distribution of customers’ valuation and patience level a priori, and can only learn from the realized total sales in every period, which is subject to noise. He also cannot distinguish between different customer types and cannot observe the number of patient customers waiting in the system. In this paper, we first introduce a learning algorithm that can converge to an optimal decreasing cyclic policy with a logarithmic regret, by only using the total sales information. Then, we introduce a larger family of policies, called threshold-regulated policies, which contains both the decreasing cyclic policies and the nested decreasing cyclic policies. For this broader set of policies, we introduce our second learning algorithm that can converge to an optimal threshold-regulated policies at a near-optimal rate. We also conduct extensive numerical studies to show that both learning algorithms converge very efficiently and the gap between the expected average revenue of an optimal threshold-regulated policy and an optimal policy is usually negligible.
Suggested Citation: Suggested Citation