The Mathematics of Q-Learning and the Hamilton-Jacobi-Bellman Equation
7 Pages Posted: 10 Feb 2025
Date Written: January 05, 2025
Abstract
We provide a comprehensive mathematical analysis of the relationship between Q-learning, a canonical model-free reinforcement learning (RL) algorithm, and the Hamilton-Jacobi-Bellman (HJB) equation, a fundamental partial differential equation (PDE) in continuous-time optimal control. By rigorously examining the limit as the time discretization of a Markov Decision Process (MDP) vanishes, we connect the discrete Bellman optimality equation to the continuoustime HJB equation, showing that Q-learning approximations converge in the viscosity solution sense to the unique solution of the HJB equation. Our treatment leverages the theory of viscosity solutions, comparison principles, nonlinear semigroups, and monotone approximation schemes from PDE theory. We detail the conditions required for existence, uniqueness, and stability of viscosity solutions, and explore how convergence, stability, and error analysis from PDE approximation schemes map onto conditions for Q-learning's convergence. The paper also generalizes the framework to risk-sensitive control problems, mean-field games, and robust/adversarial scenarios, each of which leads to more complex PDEs, such as risk-sensitive or Isaacs-type HJB equations. We further discuss how advanced PDE approximation techniques, including monotone finite difference schemes, semi-Lagrangian methods, and deep neural PDE solvers (such as physicsinformed neural networks), can inform the design of RL algorithms. This unified perspective encourages cross-fertilization between PDE theory and RL, guiding the creation of more robust, efficient, and theoretically grounded algorithms for continuous control and beyond.
Keywords: Q-Learning, Hamilton–Jacobi–Bellman Equation, Reinforcement Learning, Viscosity Solutions, Optimal Control
JEL Classification: C61, C63, C65, D81, D83, G11
Suggested Citation: Suggested Citation