Linear Classifiers Under Infinite Imbalance

44 Pages Posted: 11 Jun 2021

See all articles by Paul Glasserman

Paul Glasserman

Columbia Business School

Mike Li

Columbia Business School

Date Written: June 9, 2021

Abstract


We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.

Keywords: classification, statistics, data imbalance, credit risk

JEL Classification: C38, C18, C44, G21

Suggested Citation

Glasserman, Paul and Li, Mike, Linear Classifiers Under Infinite Imbalance (June 9, 2021). Available at SSRN: https://ssrn.com/abstract=3863653 or http://dx.doi.org/10.2139/ssrn.3863653

Paul Glasserman

Columbia Business School ( email )

3022 Broadway
403 Uris Hall
New York, NY 10027
United States
212-854-4102 (Phone)
212-316-9180 (Fax)

Mike Li (Contact Author)

Columbia Business School ( email )

New York, NY
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
40
Abstract Views
206
PlumX Metrics