Re-Smote: A Novel Imbalanced Sampling Method Based on Smote with Radius Estimation
14 Pages Posted: 16 Feb 2023
Abstract
Imbalance is a distinctive feature of many datasets, and how to make the dataset balanced become a hot topic in machine learning field. Synthetic Minority Oversampling Technique (SMOTE) is the classical method to solve this problem. Although there are many researches about SMOTE, there is still problem of synthetic sample singularity. In order to solve the issues of class imbalance and diversity of generated samples, this paper proposes a hybrid resampling method for binary imbalanced data sets, RE-SMOTE, which is designed based on the improvements of two oversampling methods PF-SMOTE and SMOTE-WENN. First, the minority class sample points are divided into safe minority and boundary minority as used in PF. The boundary minority samples are re-organized by linear interpolation with nearest majority class samples, and the safe minority samples are re-organized by circle range that take the initial safe minority samples as center and the distance from the nearest majority samples as radius. Second, the noise samples are further cleaned according to relative density with WENN. Relative density is computed by ratio between number of majority sample and minority sample among reverse k-nearest neighbor samples. To verify the effectiveness and robustness of the proposed model, we conducted a comprehensive experimental study on 40 datasets selected from real applications. The experimental results show the superiority of RE-SMOTE over other state-of-the-art methods.
Keywords: imbalanced data sampling, SMOTE, radius estimation
Suggested Citation: Suggested Citation