Enhancing Online Food Delivery with Transfer Points: A Decompose-Then-Optimize Approach via Hierarchical Reinforcement Learning

Zhang, Xinyuan; Luo, Qi; Qian, Xinwu

Download This Paper

Open PDF in Browser

Add Paper to My Library

Enhancing Online Food Delivery with Transfer Points: A Decompose-Then-Optimize Approach via Hierarchical Reinforcement Learning

48 Pages Posted: 5 May 2025 Last revised: 24 May 2025

See all articles by Xinyuan Zhang

Qi Luo

University of Iowa - Department of Business Analytics

Xinwu Qian

Rice University

Date Written: April 01, 2025

Abstract

Online food delivery services can reduce operational costs and optimize efficiency by consolidating orders with similar origins, destinations, and time windows at intermediate transfer locations. This research investigates the complexity inherent in the online food delivery problem with transfer (OFDP-T) and assesses how optimized routes and courier assignments involving transfer locations can enhance system delivery performances. We propose a novel learning-based decompose-then-optimize framework to manage the exponentially growing problem size introduced by transfer and synchronization decisions and enable adaptive decision-making under uncertainty. This proposed decomposition framework is enabled by a seamless integration between a first-step hierarchical reinforcement learning (HRL) model and the resulting second-step model that can be solved as a linear assignment problem (LAP). Through comprehensive experiments based on real-world food delivery data, the study demonstrates that the combination of task-agnostic reward design and LAP-guided policy search significantly improve the baseline methods. Our case study shows that task-agnostic reward design and LAP-guided policy search improve baseline performance by 27.2\%, with the reward shaping alone boosting HRL by 37.5\% and LAP-guided search adding 6.9\%. Notably, even limited use of transfers can yield over 46.6\% improvement in route efficiency and a 23.2\% gain for remaining orders. This framework offers a deployable, real-time solution and actionable strategies for coordinating complex delivery operations and improving fleet utilization that will empower more sustainable and scalable food delivery systems.

Keywords: Online food delivery problem, hierarchical reinforcement learning, agnostic reward design

Suggested Citation: Suggested Citation

Zhang, Xinyuan and Luo, Qi and Qian, Xinwu, Enhancing Online Food Delivery with Transfer Points: A Decompose-Then-Optimize Approach via Hierarchical Reinforcement Learning (April 01, 2025). Available at SSRN: https://ssrn.com/abstract=5201175