Inventory Balancing with Online Learning
59 Pages Posted: 2 Sep 2018 Last revised: 30 Aug 2021
Date Written: August 22, 2018
Abstract
We study a general problem of allocating limited resources to heterogeneous customers over time, under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources, and returns different rewards for the resources consumed. We consider a general model framework, where the resource consumption distribution associated with each (customer type, action) combination is not known, but is consistent and can be learned over time. In addition, the sequence of customer types to arrive over time is arbitrary and completely unknown. We achieve near optimality under both model uncertainty and customer heterogeneity by judiciously synergizing two algorithmic frameworks in the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive; and online learning, which shows how to "explore'' the resource consumption distributions of each customer type under different actions. We define an auxiliary problem, which allows for existing competitive ratio and regret bounds to be seamlessly integrated. Furthermore, we show that the performance guarantee generated by our framework is tight, using the special case of the online bipartite matching problem with unknown match probabilities. Finally, we demonstrate the practicality and efficacy of algorithms generated by our framework using a publicly available hotel data set.
Suggested Citation: Suggested Citation