Re-Smote: A Novel Imbalanced Sampling Method Based on Smote with Radius Estimation

14 Pages Posted: 16 Feb 2023

See all articles by Huiyuan Jiang

Huiyuan Jiang

Northeastern University

Yiyang Li

Northeastern University

Baoyu Liu

Northeastern University

Keming Mao

Northeastern University

Yuhai Zhao

Northeastern University

Abstract

Imbalance is a distinctive feature of many datasets, and how to make the dataset balanced become a hot topic in machine learning field. Synthetic Minority Oversampling Technique (SMOTE) is the classical method to solve this problem. Although there are many researches about SMOTE, there is still problem of synthetic sample singularity. In order to solve the issues of class imbalance and diversity of generated samples, this paper proposes a hybrid resampling method for binary imbalanced data sets, RE-SMOTE, which is designed based on the improvements of two oversampling methods PF-SMOTE and SMOTE-WENN. First, the minority class sample points are divided into safe minority and boundary minority as used in PF. The boundary minority samples are re-organized by linear interpolation with nearest majority class samples, and the safe minority samples are re-organized by circle range that take the initial safe minority samples as center and the distance from the nearest majority samples as radius. Second, the noise samples are further cleaned according to relative density with WENN. Relative density is computed by ratio between number of majority sample and minority sample among reverse k-nearest neighbor samples. To verify the effectiveness and robustness of the proposed model, we conducted a comprehensive experimental study on 40 datasets selected from real applications. The experimental results show the superiority of RE-SMOTE over other state-of-the-art methods.

Keywords: imbalanced data sampling, SMOTE, radius estimation

Suggested Citation

Jiang, Huiyuan and Li, Yiyang and Liu, Baoyu and Mao, Keming and Zhao, Yuhai, Re-Smote: A Novel Imbalanced Sampling Method Based on Smote with Radius Estimation. Available at SSRN: https://ssrn.com/abstract=4361702 or http://dx.doi.org/10.2139/ssrn.4361702

Huiyuan Jiang

Northeastern University ( email )

220 B RP
Boston, MA 02115
United States

Yiyang Li

Northeastern University ( email )

220 B RP
Boston, MA 02115
United States

Baoyu Liu

Northeastern University ( email )

220 B RP
Boston, MA 02115
United States

Keming Mao (Contact Author)

Northeastern University ( email )

220 B RP
Boston, MA 02115
United States

Yuhai Zhao

Northeastern University ( email )

220 B RP
Boston, MA 02115
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
58
Abstract Views
302
Rank
801,699
PlumX Metrics