Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management

16 Pages Posted: 9 Aug 2021 Last revised: 8 Oct 2021

See all articles by Pavithra Harsha

Pavithra Harsha

IBM Research

Ashish Jagmohan

IBM

Jayant Kalagnanam

IBM Corporation - Thomas J. Watson Research Center

Brian Quanz

affiliation not provided to SSRN

Divya Singhvi

New York University (NYU) - Leonard N. Stern School of Business

Date Written: May 28, 2021

Abstract

Reinforcement Learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. Many problems in Operations Management (inventory and revenue management, for example) are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. Analytically, we show that the for a given critic, the learned policy in each iteration converges to the optimal policy as the underlying samples of the uncertainty go to infinity. Practically, we show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings. We find that PARL outperforms commonly used base stock heuristic by 51.3% and RL based methods by up to 9.58% on average across different supply chain environments.

Keywords: Reinforcement Learning, Inventory Management

Suggested Citation

Harsha, Pavithra and Jagmohan, Ashish and Kalagnanam, Jayant and Quanz, Brian and Singhvi, Divya, Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management (May 28, 2021). Available at SSRN: https://ssrn.com/abstract=3901070 or http://dx.doi.org/10.2139/ssrn.3901070

Pavithra Harsha

IBM Research ( email )

T. J. Watson Research Center
Yorktown Heights, NY 10598
United States

Ashish Jagmohan

IBM ( email )

United States

Jayant Kalagnanam

IBM Corporation - Thomas J. Watson Research Center ( email )

Route 134
Kitchawan Road
Yorktown Heights, NY 10598
United States

Brian Quanz

affiliation not provided to SSRN

Divya Singhvi (Contact Author)

New York University (NYU) - Leonard N. Stern School of Business ( email )

44 West 4th Street
Suite 9-160
New York, NY NY 10012
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
128
Abstract Views
437
rank
309,787
PlumX Metrics