De-Biased Random Forest Variable Selection

24 Pages Posted: 22 Dec 2011 Last revised: 18 Feb 2013

Date Written: December 22, 2011

Abstract

This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.

Keywords: random forest, variable importance, interaction effects, logistic regression, interaction effects, predictive modeling, biases

Suggested Citation

Sharma, Dhruv, De-Biased Random Forest Variable Selection (December 22, 2011). Available at SSRN: https://ssrn.com/abstract=1975801 or http://dx.doi.org/10.2139/ssrn.1975801

Dhruv Sharma (Contact Author)

Independent ( email )

2023 N. Cleveland St.
Arlington, VA 22201
United States

HOME PAGE: http://theinterdisciplinarian.com/

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
189
Abstract Views
1,063
rank
176,951
PlumX Metrics