Classification of Cervical Cancer Dataset
Avishek Choudhury, Wesabi, Classification of Cervical Cancer Dataset, Proceedings of the 2018 IISE Annual Conference, Orlando, p. 1456-1461
6 Pages Posted: 23 Dec 2018
Date Written: December 7, 2018
Cervical cancer is the leading gynecological malignancy worldwide. This paper presents diverse classification techniques and shows the advantage of feature selection approaches to the best predicting of cervical cancer disease. There are thirty-two attributes with eight hundred and fifty-eight samples. Besides, this data suffers from missing values and imbalance data. Therefore, over-sampling, under-sampling and embedded over and under sampling have been used. Furthermore, dimensionality reduction techniques are required for improving the accuracy of the classifier. Therefore, feature selection methods have been studied as they divided into two distinct categories, filters and wrappers. The results show that age, first sexual intercourse, number of pregnancies, smokes, hormonal contraceptives, and STDs: genital herpes are the main predictive features with high accuracy with 97.5%. Decision Tree classifier is shown to be advantageous in handling classification assignment with excellent performance.
Keywords: Cervical cancer, feature selection, classification, imbalanced data, over-sampling
Suggested Citation: Suggested Citation