Cluster-based Feature Selection

Market Technician, Issue 90, March 2021, 11-22

Posted: 14 Jul 2021

Date Written: Dec 22, 2020

Abstract

Feature importance in machine learning indicates how much information a feature contributes when building a supervised learning model, so we can exclude uninformative features from the predictive model (feature selection). It also improves the human interpretability of the resulting model. Recently, Man & Chan (2021) compared the stability of features selected by different methods such as MDA, SHAP, or LIME when they are subject to the computational randomness of the selection algorithms. In this article, we study whether the cluster-based MDA (cMDA) method proposed by López de Prado, M. (2020) improves predictive performance, feature stability, and model interpretability. We applied cMDA to two synthetic datasets, a clinical public dataset, and two financial datasets. In all cases, the stability and interpretability of the cMDA-selected features are superior to MDA-selected features.

Suggested Citation

Man, Xin and Chan, Ernest, Cluster-based Feature Selection (Dec 22, 2020). Market Technician, Issue 90, March 2021, 11-22, Available at SSRN: https://ssrn.com/abstract=3880641

Xin Man

PredictNow.ai ( email )

56 Niagara on the Green Blvd
Niagara-on-the-Lake, L0S 1J0
Canada

Ernest Chan (Contact Author)

PredictNow.ai ( email )

56 Niagara on the Green Blvd
Niagara-on-the-Lake, L0S 1J0
Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
621
PlumX Metrics