Online Learning for Constrained Assortment Optimization under Markov Chain Choice Model
52 Pages Posted: 20 Apr 2022 Last revised: 5 Apr 2023
Date Written: April 9, 2022
Abstract
We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of $N$ products under a Markov chain choice (MCC) model. The retailer only observes the assortment and the customer's single choice per period. Given limited display capacity, resource constraints, and no \emph{a priori} knowledge of problem parameters, the retailer's objective is to sequentially learn the choice model and optimize cumulative revenues over a selling horizon of length $T$. We develop an explore-then-exploit learning algorithm that balances the trade-off between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCC model by solving linear equations and determining the near-optimal assortment based on these estimates. Furthermore, our consistent estimators enjoy superior computational times compared to existing heuristic estimation methods that suffer from inconsistency or a large computational burden.
Keywords: online learning, assortment planning, Markov chain choice model, capacity, regret analysis
Suggested Citation: Suggested Citation