I*: Optimizing Logistic Regression to Match Ensemble Performance Using Random Forest Variable Importance

75 Pages Posted: 6 Jun 2011 Last revised: 6 Sep 2011

Date Written: June 5, 2011

Abstract

An automated directed search procedure called interaction miner or I* is outlined as an entity which allows logistic regression models to be built automatically based on theory suggested by random forest variable importance measures of predictive value of attributes. The fact that interaction effects can be added to regression models using intelligent directed information show that predictive models can be built without art and with science. It is unclear how important this is, but it appears ensemble methods derive their power by extracting information about interaction effects in data. Once this is accounted for regression models can match or outperform random forests. Tuning regression to outperform ensemble methods is the goal of this algorithm. It is shown to work on 3 credit data sets. This is an automated heuristic approach based on the observations in various credit and behavioral data sets that out of the box random forest outperforms logistic regression but after tuning based on random forest variable importance logistic regression can be tuned to match or outperform random forest models by adding interaction terms.

Keywords: logistic regression, random forest, interaction mining, variable selection, automated model building,ensemble performance regression

Suggested Citation

Sharma, Dhruv, I*: Optimizing Logistic Regression to Match Ensemble Performance Using Random Forest Variable Importance (June 5, 2011). Available at SSRN: https://ssrn.com/abstract=1858378 or http://dx.doi.org/10.2139/ssrn.1858378

Dhruv Sharma (Contact Author)

Independent ( email )

2023 N. Cleveland St.
Arlington, VA 22201
United States

HOME PAGE: http://theinterdisciplinarian.com/

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
380
Abstract Views
1,621
rank
85,898
PlumX Metrics