Scalable Deep Reinforcement Learning in the Non-Stationary Capacitated Lot Sizing Problem

23 Pages Posted: 28 May 2024

See all articles by Lotte van Hezewijk

Lotte van Hezewijk

Eindhoven University of Technology (TUE)

Nico Dellaert

affiliation not provided to SSRN

Willem van Jaarsveld

Eindhoven University of Technology (TUE)

Abstract

Capacitated lot sizing problems in situations with stationary and non-stationary demand (SCLSP) are very common in practice. Solving problems with a large number of items using Deep Reinforcement Learning (DRL) is challenging due to the large action space. This paper proposes a new Markov Decision Process (MDP) formulation to solve this problem, by decomposing the production quantity decisions in a period into sub-decisions, which reduces the action space dramatically. We demonstrate that applying Deep Controlled Learning (DCL) yields policies that outperform the benchmark heuristic as well as a prior DRL implementation. By using the decomposed MDP formulation and DCL method outlined in this paper, we can solve larger problems compared to the previous DRL implementation. Moreover, we adopt a non-stationary demand model for training the policy, which enables us to readily apply the trained policy in dynamic environments when demand changes.

Keywords: deep reinforcement learning, capacitated lot sizing, non-stationary demand

Suggested Citation

van Hezewijk, Lotte and Dellaert, Nico and van Jaarsveld, Willem, Scalable Deep Reinforcement Learning in the Non-Stationary Capacitated Lot Sizing Problem. Available at SSRN: https://ssrn.com/abstract=4846298 or http://dx.doi.org/10.2139/ssrn.4846298

Lotte Van Hezewijk (Contact Author)

Eindhoven University of Technology (TUE) ( email )

Nico Dellaert

affiliation not provided to SSRN ( email )

No Address Available

Willem van Jaarsveld

Eindhoven University of Technology (TUE) ( email )

PO Box 513
Eindhoven, 5600 MB
Netherlands

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
40
Abstract Views
138
PlumX Metrics