Online Learning for Constrained Assortment Optimization under Markov Chain Choice Model
43 Pages Posted: 20 Apr 2022
Date Written: April 9, 2022
We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of $N$ products under a Markov-chain-based choice (MCBC) model. The retailer observes only the assortment and the customer's single choice per period. Given limited display capacity, resource constraints, and no a priori knowledge of problem parameters, the retailer's objective is to sequentially learn the choice model and optimize cumulative revenues over a selling horizon of length $T$. We develop an explore-then-exploit learning algorithm that balances the trade-off between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCBC model by solving linear equations and determining the near-optimal assortment based on these estimates. Furthermore, compared to existing heuristic estimation methods that suffer from inconsistency and a large computational burden, our consistent estimators enjoy superior computational times.
Keywords: online learning, assortment planning, Markov chain choice model, capacity, regret analysis
Suggested Citation: Suggested Citation