Spark Based Framework for Breast Cancer Analysis

10 Pages Posted: 27 Feb 2018

See all articles by B Sathiyabhama

B Sathiyabhama

Sona College of Technology - Department of Computer Science and Engineering

S Udhaya Kumar

Sona College of Technology

J. Jayanthi

Sona College of Technology

T. Sathiya

Sona College of Technology

A.K. Ilavarasi

Sona College of Technology

V. Yuvarajan

Sona College of Technology

Konga Gopikrishna

Ecole Normale Supérieure d'Abidjan - Department of Science and Technology

Date Written: November 15, 2017

Abstract

Breast cancer is the second most common cancers discovered around the world and that record for one-fourth of all cancers in women. Among the other kinds of diseases, breast cancer causes more number of deaths in many countries. An early identification for breast tumor gives the opportunity of its cure; therefore, an extensive amount of investigations are presently setting on to recognize techniques that could identify breast cancer in its initial phases. The healthcare sector has a tremendous amount of information and imperative data about patients and their well-being conditions. Hence, it is the need of the hour to utilize that huge information for medical practitioners to predict the disease. One approach for taking care of this issue has been handled by numerous researchers utilizing Machine Learning (ML) strategies to upgrade the prediction procedure through applying different tree-based classifiers. However, most of the tree based ML algorithms will not be able to handle huge amount of complex data. This issue is addressed through efficient tree-based classifiers (Decision Tree, Random Forest classifier, gradient boosting classifier) with Apache Spark framework. The experiments are conducted using Wisconsin Breast Cancer Dataset (WBCD) from UCI repository. Experimental results have demonstrated that the Random Forest Classifier outperformed the other two tree-based classification algorithms in most of the cases form this research study.

Keywords: Breast Cancer Wisconsin diagnostic dataset, Apache Spark, Big data in Healthcare, Decision Tree, Random Forest, Gradient Boosting Classifier

Suggested Citation

Sathiyabhama, B and Udhaya Kumar, S and Jayanthi, J. and Sathiya, T. and Ilavarasi, A.K. and Yuvarajan, V. and Gopikrishna, Konga, Spark Based Framework for Breast Cancer Analysis (November 15, 2017). Proceedings of the International Conference on Intelligent Computing Systems (ICICS 2017 – Dec 15th - 16th 2017) organized by Sona College of Technology, Salem, Tamilnadu, India. Available at SSRN: https://ssrn.com/abstract=3125283 or http://dx.doi.org/10.2139/ssrn.3125283

B Sathiyabhama (Contact Author)

Sona College of Technology - Department of Computer Science and Engineering ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

S Udhaya Kumar

Sona College of Technology ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

J. Jayanthi

Sona College of Technology ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

T. Sathiya

Sona College of Technology ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

A.K. Ilavarasi

Sona College of Technology ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

V. Yuvarajan

Sona College of Technology ( email )

Junction Main Road
Suramangalam
Salem, Tamil Nadu 636005
India

Konga Gopikrishna

Ecole Normale Supérieure d'Abidjan - Department of Science and Technology ( email )

Abidjan
Ivory Coast (Cote D'ivoire)

Register to save articles to
your library

Register

Paper statistics

Downloads
63
Abstract Views
336
rank
346,161
PlumX Metrics