A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

50 Pages Posted: 19 Oct 2020

See all articles by Tengyuan Liang

Tengyuan Liang

University of Chicago - Booth School of Business

Pragya Sur

affiliation not provided to SSRN

Date Written: October 16, 2020

Abstract

This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.

Suggested Citation

Liang, Tengyuan and Sur, Pragya, A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers (October 16, 2020). University of Chicago, Becker Friedman Institute for Economics Working Paper No. 2020-152, Available at SSRN: https://ssrn.com/abstract=3714013 or http://dx.doi.org/10.2139/ssrn.3714013

Tengyuan Liang (Contact Author)

University of Chicago - Booth School of Business ( email )

Pragya Sur

affiliation not provided to SSRN

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
4
Abstract Views
75
PlumX Metrics