Highly Improve the Clustering Accuracy by Ameliorating Dataset with Object Hopping

25 Pages Posted: 9 Sep 2024

See all articles by Xianjun Zeng

Xianjun Zeng

Beijing Institute of Technology

shuliang Wang

Beijing Institute of Technology

Qi Li

Beijing Institute of Technology

Sijie Ruan

Beijing Institute of Technology

Haoxiang Xu

Beijing Institute of Technology

Abstract

In order to improve the accuracy of clustering algorithms on complex datasets, a series of studies have been conducted to make the datasets more friendly to clustering algorithms by ameliorating the data distribution. However, these studies calculate the density-mismatched movement magnitude for objects and apply an artificial, uniform convergence criterion to both sparse and dense clusters. As a result, they are unable to synchronously shrink sparse and dense clusters, which ultimately leads to a significant reduction in their effectiveness on density-imbalanced datasets. To this end, we propose a novel dataset-amelioration method called HIOH (Highly Improve the Clustering Accuracy by Ameliorating Dataset with Object Hopping). HIOH calculates a density-matched hopping radius for each object to control object's movement magnitude to match its density. Based on the hopping radius, a ``Hopping" mechanism is designed to eliminate the artificial convergence criterion and to guarantee synchronous shrinkage of both dense and sparse clusters. We conducted numerous experiments to validate HIOH. Experimental results indicate that in terms of 5 evaluation metrics (ARI, NMI, FMI, PUR, FMI), HIOH outperforms baseline algorithms by at least 33.5% (ARI), 27.5% (NMI), 14% (FMI), 9.5% (PUR), and 28.5% (VM) in improving the accuracy of clustering algorithms on density-imbalanced datasets. The code and datasets are available at https://github.com/XJaiYH/HIOH.

Keywords: Clustering, Dataset amelioration, Improving accuracy, Density-imbalanced datasets

Suggested Citation

Zeng, Xianjun and Wang, shuliang and Li, Qi and Ruan, Sijie and Xu, Haoxiang, Highly Improve the Clustering Accuracy by Ameliorating Dataset with Object Hopping. Available at SSRN: https://ssrn.com/abstract=4951444 or http://dx.doi.org/10.2139/ssrn.4951444

Xianjun Zeng

Beijing Institute of Technology ( email )

5 South Zhongguancun street
Center for Energy and Environmental Policy Researc
Beijing, 100081
China

Shuliang Wang (Contact Author)

Beijing Institute of Technology ( email )

5 South Zhongguancun street
Center for Energy and Environmental Policy Researc
Beijing, 100081
China

Qi Li

Beijing Institute of Technology ( email )

5 South Zhongguancun street
Center for Energy and Environmental Policy Researc
Beijing, 100081
China

Sijie Ruan

Beijing Institute of Technology ( email )

5 South Zhongguancun street
Center for Energy and Environmental Policy Researc
Beijing, 100081
China

Haoxiang Xu

Beijing Institute of Technology ( email )

5 South Zhongguancun street
Center for Energy and Environmental Policy Researc
Beijing, 100081
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
46
Abstract Views
185
PlumX Metrics