Performance Analysis of Data Mining Algorithms for Diagnosis and Prediction of Heart and Breast Cancer Disease
Review Of Research Vol. 3, Issue. 8, May. 2014
16 Pages Posted: 30 Jun 2017
Date Written: June 29, 2017
Heart disease or cardiovascular diseases are the number one cause of death and they are projected to remain so. An estimated 17 million people died from cardiovascular disease in 2005, representing 30% of all global deaths. Of these deaths, 7.2 million were due to heart attacks and 5.7 million due to stroke. About 80% of these deaths occurred in low- and middle-income countries. If current trends are allowed to continue, by 2030 an estimated 23.6 million people will die from cardiovascular disease (mainly from heart attacks and strokes).
Breast cancer is the second most common cancer in women. The World Health Organization's International estimated that more than 1,50,000 women worldwide die of breast cancer in year. In India, breast cancer accounts for 23% of all the female cancer death followed by cervical cancer which accounts to 17.5% in India.
The main objective of this manuscript is to report on a research project where we took advantage of those available technological advancements to develop prediction models for heart disease patients and breast cancer survivability. We used five popular data mining algorithms (Naïve Bayes, RBF Network, Simple Logistic, J48 and Decision Tree) to develop the prediction models using a large dataset (270 Heart disease and 683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the five prediction models for performance comparison purposes. The results (based on average accuracy of Heart and Breast Cancer data set) indicated that the Naïve Bayes is the best predictor with 87.01% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 86.9% accuracy, Simple Logistic came out to be third with 85.65% accuracy, J48 came out fourth with 84.85% accuracy and the Decision table models came out to be the worst of the five with 83.34% accuracy.
Keywords: Cardiovascular disease, Breast cancer, Naïve Bayes, RBF Network, Simple Logistic, J48, Decision Tree.
Suggested Citation: Suggested Citation