An Efficient Learning Framework for Multi-Product Inventory Systems with Customer Choices
47 Pages Posted: 18 Feb 2021 Last revised: 22 Feb 2022
Date Written: January 29, 2021
In this paper, we first introduce a periodic-review multi-product inventory system where each customer's demand is affected by the product availabilities and the customer's preference. As customer preferences are not directly observable and hard to estimate, when the full distributional information of the demand is not available, the decision-maker has to learn the information on-the-fly, through the partial and censored feedback of customers. For this learning problem, if one ignores the inventory dynamic and simply treat this as a Multi-Armed Bandit problem and directly applies some existing algorithms, e.g., the Upper Confidence Bound (UCB) algorithm, the convergence can be extremely slow due to the high-dimensionality of the policy space. We propose a UCB-based learning framework that utilizes the demand information based on two improvement ideas. We illustrate how these two ideas can be incorporated by considering two specific systems: 1) multi-product inventory system with stock-out substitutions, 2) multi-product inventory assortment problem for urban warehouses. We develop improved UCB algorithms for both systems, using the two improvements. For both systems, the algorithm can achieve a tight worst-case convergence rate (up to a logarithmic term) on the planning horizon T. Extensive numerical experiments are conducted to demonstrate the efficiency of the improved UCB algorithms for the two systems. In the experiments, when there are more than 1000 candidate policies to choose from, the algorithms can achieve around 15% average expected regret within 50 periods and continues to steadily improve as time increases.
Keywords: Inventory management, Online learning, Customer choices
Suggested Citation: Suggested Citation