A Reality Check for Credit Default Models
21 Pages Posted: 17 Sep 2011 Last revised: 19 Sep 2011
Date Written: August 31, 2011
We propose a model selection methodology for credit default modeling in the presence of a large number of variables and candidate models. Accurate credit default models are critical to financial institutions for making effective underwriting and pricing decisions in terms of profit maximization and loss mitigation. Credit default modeling routinely involves large data sets and considers an extremely large set of candidate models. This leads to deriving statistical inference under a multiple hypothesis-testing scheme. An unguarded use of single-inference procedures or the recently popular data snooping techniques such as variable reduction via decision tree analysis and stepwise procedure leave a modeler at risk of making numerous false statistical discoveries, that is pure chance makes the likelihood of a type I error extremely high in data rich environments. To mitigate these concerns we control for the false discovery rate in our model selection procedure and make inference when p-values are dependent. A Monte Carlo study shows that in large data sets with high co-linearity between observations, a naïve data snooping approach leads to multiple false discoveries, and a reduction in prediction accuracy. An empirical application of this proposed methodology uses the Office of the Comptroller of the Currency Consumer Credit Database, which is a large random sample of individual and tradeline data from one of the three national credit bureaus between 1999 and 2009.
Keywords: false discoveries, credit default, data mining, multiple hypothesis testing
JEL Classification: B41, C12
Suggested Citation: Suggested Citation