Online Planning with Offline Simulation
45 Pages Posted: 27 Nov 2020
Date Written: October 12, 2020
One of the central issues in (finite horizon) online planning problems is to synthesize the impact of real time decisions on the subsequent states of the system, and the performance in the remaining time horizon (cost-to-go function). A complete resolution often leads to intractable dynamic programming problems. In this paper, we propose a computationally efficient approach to this problem that attains near-optimal performance in non-stationary environments. More specifically, we study a general class of online planning problems with concave objective functions and (global) feasibility constraints. A wide range of problems in supply chain management, online advertising, and network revenue management etc., can be appropriately modelled using this online planning framework. Leveraging on the value of the "gradient" information obtained from offline simulation (generated from the distributional information), we develop a generic approach to facilitate online planning for this class of problems. Furthermore, our proposed approach produces near optimal solution with sublinear regret and satisfies the feasibility constraints with high probability. We present extensive numerical evidence to validate the performance of this approach, and discuss its improvement over existing techniques that assume the underlying environment is stationary.
Keywords: Online Planning; Non-Stationary Environment; Distributional Information; Offline Simulation
Suggested Citation: Suggested Citation