Evidence in Favor of Weight of Evidence and Binning Transformations for Predictive Modeling

39 Pages Posted: 12 Sep 2011

Date Written: September 10, 2011


Weight of Evidence transformation of categorical variables is a technique used by credit scoring professionals for decades. This paper investigates whether using this transformation improves predictive performance. For models without interaction terms the use of Weight of evidence and binning or discretization of numeric variables improves predictive accuracy. The addition of Weight of evidence transformations without binning is marginal in models without interactions. This is consistent with the excellent results achieved on the Paralyzed Veteran Admin KDD 98 data where the best performance was achieved using both WOE and binning.

For models with interaction terms the use of WOE transform improves model performance for 2 out of 3 data sets and performance is the same for the third data set with or without the WOE. WOE tends to improve logistic regression and I* tuned logistic regression performance while degrading random forest performance slightly. WOE and I* algorithm thus reach peak predictive models in achieving area under the curve competitive with winning KDD benchmarks.

The combination of WOE and binning reduces performance for models with interaction terms. This makes sense in retrospect as binning variables results in loss of information about interaction amongst continuous variables.

WOE and binning thus improve model performance when used together when modeling without interactions as thought by practitioners. However when interaction effects exist in the data interaction effects are more predictive than WOE and binning and WOE should be used alone as binning can result in loss of predictive power of interaction effects. Interactions exist in the data set when random forest outperforms logistic regression out of the box (Sharma, 2011b).

Suggested Citation

Sharma, Dhruv, Evidence in Favor of Weight of Evidence and Binning Transformations for Predictive Modeling (September 10, 2011). Available at SSRN: https://ssrn.com/abstract=1925510 or http://dx.doi.org/10.2139/ssrn.1925510

Dhruv Sharma (Contact Author)

Independent ( email )

2023 N. Cleveland St.
Arlington, VA 22201
United States

HOME PAGE: http://theinterdisciplinarian.com/

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics