Reinforcement Learning for Search Tree Size Minimization in Constraint Programming: New Results on Scheduling Benchmarks
55 Pages Posted: 27 Aug 2024
Abstract
One of the significant search algorithms for scheduling using Constraint Programming (CP) is Failure-Directed Search (FDS), a generic complete search algorithm designed to efficiently explore the solution search space. Despite its genericity, FDS proved optimality or solved optimally numerous scheduling instances that were open for decades. In this paper, we focus on FDS's properties and show that minimizing the size of its search tree guided by branching decisions (choices) ordered by continuously updated ratings has the same properties as arm selection in the Multi-armed Bandit (MAB) problem. Since the exploration-exploitation dilemma, a key aspect of the MAB problem, was not originally considered in FDS, we apply various reinforcement learning algorithms for the MAB problem to FDS. We extend them with problem-specific improvements and finally enhance the performance of FDS by parameter tuning. For the performance evaluation, we use the Job Shop Scheduling Problem (JSSP) and Resource-Constrained Project Scheduling Problem (RCPSP), two of the most fundamental scheduling problems studied for almost 60 years. The results show the improved FDS implemented in a new solver called OptalCP is 1.7 times faster on JSSP and 2.5 times faster on RCPSP benchmarks compared to the original FDS. It is also 3.5 times faster on JSSP and 2.1 times faster on RCPSP than the current state-of-the-art FDS algorithm in IBM CP Optimizer 22.1. Another contribution is the improvement of existing state-of-the-art lower bounds for said problems. From all open instances in standard benchmark sets, 78 out of 84 JSSP and 226 out of 393 of RCPSP lower bounds were improved while using just a 900-second time limit per instance. A small number of those instances was also closed (optimal makespan was found).
Keywords: Constraint Programming, Reinforcement Learning, Discrete Optimization, Scheduling, Tree Search, Heuristics
Suggested Citation: Suggested Citation