Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

39 Pages Posted: 5 Dec 2022

See all articles by Xiaotian Liu

Xiaotian Liu

Peking University

Ming Hu

University of Toronto - Rotman School of Management

Yijie Peng

Peking University

Yaodong Yang

Peking University

Date Written: October 30, 2022

Abstract

We apply Multi-Agent Deep Reinforcement Learning (MADRL) to inventory management problems with multiple echelons and evaluate MADRL's performance to minimize the overall costs of a supply chain. We also examine whether the upfront-only information-sharing mechanism used in MADRL helps alleviate the bullwhip effect in a supply chain. We apply Heterogeneous-Agent Proximal Policy Optimization (HAPPO) on the multi-echelon inventory management problems in both a serial supply chain and a supply chain network. Our results show that policies constructed by HAPPO achieve lower overall costs than policies constructed by single-agent deep reinforcement learning and other heuristic policies. Also, the application of HAPPO results in a less significant bullwhip effect than policies constructed by single-agent deep reinforcement learning where information is not shared among actors. Somewhat surprisingly, when applying HAPPO, the system achieves the lowest overall costs when the minimization target for each actor is a combination of its own costs and the overall costs of the system, and the fully self-interested reward target performs near-optimally, while one would expect using the overall costs of the system as a reward target for each actor would be optimal in training the models. Our results provide a new perspective on the benefit of information sharing inside the supply chain that helps alleviate the bullwhip effect and improve the overall performance of the system. Upfront information sharing and action coordination in model training among actors are essential, with the former more essential, for improving a supply chain's overall performance when applying MADRL. Neither actors being fully self-interested nor actors being fully system-focused leads to the optimal performance of policies learned and constructed by MADRL. Our results also verify MADRL's potential in solving various multi-echelon inventory management problems with complex supply chain structures and in non-stationary market environments.

Keywords: Multi-Echelon Inventory Management, Multi-Agent Deep Reinforcement Learning, Bullwhip Effect

Suggested Citation

Liu, Xiaotian and Hu, Ming and Peng, Yijie and Yang, Yaodong, Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management (October 30, 2022). Available at SSRN: https://ssrn.com/abstract=4262186 or http://dx.doi.org/10.2139/ssrn.4262186

Xiaotian Liu

Peking University ( email )

Ming Hu

University of Toronto - Rotman School of Management ( email )

105 St. George st
Toronto, ON M5S 3E6
Canada
416-946-5207 (Phone)

HOME PAGE: http://ming.hu

Yijie Peng (Contact Author)

Peking University ( email )

No 5 Yiheyuan Rd
Haidian District
Beijing, Beijing 100871
China

Yaodong Yang

Peking University ( email )

No. 38 Xueyuan Road
Haidian District
Beijing, 100871
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
39
Abstract Views
138
PlumX Metrics