A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
25 Pages Posted: 13 Dec 2023
Abstract
Implicit regularization induced by gradient optimization is an important way to understand the generalization in neural networks. Recent theory explains implicit regularization over the model of deep matrix factorization (DMF) and analyzes the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics can mathematically characterize the practical learning rate of adaptive gradient optimization such as RMSProp. Discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization of RMSProp, i.e. landscape analysis. It mainly focuses on gradient regions like saddle points and local minima. We investigate that increasing learning rates benefit saddle point escaping (SPE) stages. In elucidating implicit regularization through the convergence of RMSProp, we prove that for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem.
Keywords: Deep learning, implicit regularization, low-rank matrix factorization, discrete gradient dynamics, saddle point
Suggested Citation: Suggested Citation