# Markdown Pricing Under Unknown Demand

58 Pages Posted: 8 Jun 2021 Last revised: 5 Dec 2022

See all articles by Su Jia

## Su Jia

Carnegie Mellon University - David A. Tepper School of Business

## Andrew Li

affiliation not provided to SSRN

## R. Ravi

Carnegie Mellon University - David A. Tepper School of Business

Date Written: June 7, 2021

### Abstract

We consider the Unimodal Multi-Armed Bandit problem where the goal is to find the optimal price under an unknown unimodal reward function, with an additional "markdown" constraint that requires that the price exploration is non-increasing. This markdown optimization problem faithfully models a single-product revenue management problem where the objective is to adaptively reduce the price over a finite sales horizon to maximize expected revenues.

We measure the performance of an adaptive exploration-exploitation policy in terms of the regret: the revenue loss relative to the maximum revenue that could have been attained when the demand curve is known in advance. For the case of $L$-Lipschitz-bounded unimodal revenue functions with infinite inventory, we present a natural policy that explores the price space at a uniform optimal speed in $T$ steps and has regret $O(T^{3/4} (L\log T)^{1/4})$.

On the other side, we provide an almost-matching lower bound of $\Omg(L^{1/4}T^{3/4})$ on the regret of any policy. Further, under mild assumptions, we show that the above tight bounds also hold when the \inv\ is finite but is on the order of $\Omg(T)$. Our tight regret bounds highlight the additional complexity of the markdown constraint, and are asymptotically higher than the corresponding bounds without this markdown requirement of $\tilde{\Theta}(T^{1/2})$ for unimodal bandits and $\tilde{\Theta}(L^{1/3} T^{2/3})$ for $L$-Lipschitz bandits. We finally consider a generalization called Dynamic Pricing with Markup Penalty where the seller is allowed to increase the price by paying a markup penalty of magnitude $O(T^c)$ per markup where $c\in [0,1]$ is a given constant. We extend our results to a tight $\tilde O(T^{\mathrm{med}\{\frac{2}{3}, \frac{3}{4}, c\}})$ regret bound for this variant\footnote{$\mathrm{med}\{a,b,c\}$ denotes the median of the numbers $a,b,c$.

Keywords: markdown pricing, dynamic pricing, multi-armed bandits, revenue management, online learning

Suggested Citation

Jia, Su and Li, Andrew and Ravi, R., Markdown Pricing Under Unknown Demand (June 7, 2021). Available at SSRN: https://ssrn.com/abstract=3861379 or http://dx.doi.org/10.2139/ssrn.3861379