Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks
32 Pages Posted: 27 Apr 2022
Date Written: April 15, 2022
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admit a scaling limit which is H ̈older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.
Keywords: Neural Network, Residual Network, Deep Learning, Gradient Descent, Implicit Regularization
JEL Classification: C6
Suggested Citation: Suggested Citation