Dynamic Pricing and Learning with Discounting

Forthcoming in Operations Research

43 Pages Posted: 28 Sep 2022 Last revised: 3 Apr 2023

See all articles by Zhichao Feng

Zhichao Feng

The Hong Kong Polytechnic University

Milind Dawande

University of Texas at Dallas - Department of Information Systems & Operations Management

Ganesh Janakiraman

University of Texas at Dallas - Naveen Jindal School of Management

Anyan Qi

University of Texas at Dallas - Naveen Jindal School of Management

Date Written: January 19, 2023

Abstract

In many practical settings, learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. We illustrate the impact of discounting on the performance of learning algorithms by examining two classic and representative dynamic-pricing and learning problems studied in Broder and Rusmevichientong (2012) [BR] and Keskin and Zeevi (2014) [KZ]. In both settings, a seller sells a product with unlimited inventory over T periods. The seller initially does not know the parameters of the general choice model in BR (resp., the linear demand curve in KZ). Given a discount factor ρ, the retailer's objective is to determine a pricing policy to maximize the expected discounted revenue over T periods. In both settings, we establish lower bounds on the regret under any policy and show limiting bounds of Ω(√1/(1-ρ)) and Ω(√T) when T → ∞ and ρ → 1, respectively. In the model of BR with discounting, we propose an asymptotically tight learning policy and show that the regret under our policy as well that under the MLE-CYCLE policy in BR is Ο(√1/(1-ρ)) (resp., Ο(√T)) when T → ∞ (resp., ρ → 1). In the model of KZ with discounting, we present sufficient conditions for a learning policy to guarantee asymptotic optimality, and show that the regret under any policy satisfying these conditions is Ο(log(1/(1-ρ))√1/(1-ρ)) (resp., Ο(logT√T)) when T → ∞ (resp., ρ → 1). We show that three different policies - namely, the two variants of the greedy Iterated-Least-Squares policy in KZ and a different policy that we propose - achieve this upper bound on the regret. We numerically examine the behavior of the regret under our policies as well as those in BR and KZ in the presence of discounting. We also analyze a setting in which the discount factor per period is a function of the number of decision periods in the planning horizon.

Keywords: pricing, learning, discounting

JEL Classification: D42

Suggested Citation

Feng, Zhichao and Dawande, Milind and Janakiraman, Ganesh and Qi, Anyan, Dynamic Pricing and Learning with Discounting (January 19, 2023). Forthcoming in Operations Research, Available at SSRN: https://ssrn.com/abstract=4222004 or http://dx.doi.org/10.2139/ssrn.4222004

Zhichao Feng

The Hong Kong Polytechnic University ( email )

The Hong Kong Polytechnic University
Hung Hom, Kowloon
Hong Kong, 00000
Hong Kong

Milind Dawande

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Ganesh Janakiraman

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Anyan Qi (Contact Author)

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
255
Abstract Views
763
Rank
219,876
PlumX Metrics