Understanding the Performance of Machine Learning Models to Predict Credit Default: A Novel Approach for Supervisory Evaluation

44 Pages Posted: 27 Jan 2021

Date Written: January 27, 2021

Abstract

In this paper we study the performance of several machine learning (ML) models for credit default prediction. We do so by using a unique and anonymized database from a major Spanish bank. We compare the statistical performance of a simple and traditionally used model like the Logistic Regression (Logit), with more advanced ones like Lasso penalized logistic regression, Classification And Regression Tree (CART), Random Forest, XGBoost and Deep Neural Networks. Following the process deployed for the supervisory validation of Internal Rating-Based (IRB) systems, we examine the benefits of using ML in terms of predictive power, both in classification and calibration. Running a simulation exercise for different sample sizes and number of features we are able to isolate the information advantage associated to the access to big amounts of data, and measure the ML model advantage. Despite the fact that ML models outperforms Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better. We then translate this statistical performance into economic impact. We do so by estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets. Our benchmark results show that implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach. This leads us to conclude that the potential benefits in economic terms for the institutions would be significant and this justify further research to better understand all the risks embedded in ML models.

Keywords: machine learning, credit risk, prediction, probability of default, IRB system

JEL Classification: C45, C38, G21

Suggested Citation

Alonso, Andrés and Carbó, José Manuel, Understanding the Performance of Machine Learning Models to Predict Credit Default: A Novel Approach for Supervisory Evaluation (January 27, 2021). Banco de Espana Working Paper No. 2105, Available at SSRN: https://ssrn.com/abstract=3774075 or http://dx.doi.org/10.2139/ssrn.3774075

Andrés Alonso (Contact Author)

Banco de España ( email )

Alcala 50
Madrid 28014
Spain

José Manuel Carbó

Banco de España ( email )

Alcala 50
Madrid 28014
Spain

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
35
Abstract Views
253
PlumX Metrics