Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

32 Pages Posted: 27 Apr 2022

See all articles by Rama Cont

Rama Cont

University of Oxford

Alain Rossier

University of Oxford - Mathematical Institute

Renyuan Xu

University of Southern California - Epstein Department of Industrial & Systems Engineering

Date Written: April 15, 2022

Abstract

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admit a scaling limit which is H ╠łolder continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

Keywords: Neural Network, Residual Network, Deep Learning, Gradient Descent, Implicit Regularization

JEL Classification: C6

Suggested Citation

Cont, Rama and Rossier, Alain and Xu, Renyuan, Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks (April 15, 2022). Available at SSRN: https://ssrn.com/abstract=4084172 or http://dx.doi.org/10.2139/ssrn.4084172

Rama Cont

University of Oxford ( email )

Mathematical Institute
Oxford, OX2 6GG
United Kingdom

HOME PAGE: http://https://www.maths.ox.ac.uk/people/rama.cont

Alain Rossier

University of Oxford - Mathematical Institute ( email )

Radcliffe Observatory, Andrew Wiles Building
Woodstock Rd
Oxford, Oxfordshire OX2 6GG
United Kingdom

Renyuan Xu (Contact Author)

University of Southern California - Epstein Department of Industrial & Systems Engineering ( email )

United States

HOME PAGE: http://renyuanxu.github.io

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
51
Abstract Views
116
rank
514,110
PlumX Metrics