Highly Improve the Clustering Accuracy by Ameliorating Dataset with Object Hopping
25 Pages Posted: 9 Sep 2024
Abstract
In order to improve the accuracy of clustering algorithms on complex datasets, a series of studies have been conducted to make the datasets more friendly to clustering algorithms by ameliorating the data distribution. However, these studies calculate the density-mismatched movement magnitude for objects and apply an artificial, uniform convergence criterion to both sparse and dense clusters. As a result, they are unable to synchronously shrink sparse and dense clusters, which ultimately leads to a significant reduction in their effectiveness on density-imbalanced datasets. To this end, we propose a novel dataset-amelioration method called HIOH (Highly Improve the Clustering Accuracy by Ameliorating Dataset with Object Hopping). HIOH calculates a density-matched hopping radius for each object to control object's movement magnitude to match its density. Based on the hopping radius, a ``Hopping" mechanism is designed to eliminate the artificial convergence criterion and to guarantee synchronous shrinkage of both dense and sparse clusters. We conducted numerous experiments to validate HIOH. Experimental results indicate that in terms of 5 evaluation metrics (ARI, NMI, FMI, PUR, FMI), HIOH outperforms baseline algorithms by at least 33.5% (ARI), 27.5% (NMI), 14% (FMI), 9.5% (PUR), and 28.5% (VM) in improving the accuracy of clustering algorithms on density-imbalanced datasets. The code and datasets are available at https://github.com/XJaiYH/HIOH.
Keywords: Clustering, Dataset amelioration, Improving accuracy, Density-imbalanced datasets
Suggested Citation: Suggested Citation