Big data Classification based on Distributed Fuzzy Decision Trees
8 Pages Posted: 17 Apr 2020
Date Written: June 2019
Abstract
Combining decision tree algorithms with fuzzy techniques provide a solution for classifying large amount of uncertain data. However, the time and space complexity remain as an issue. The proposed system is a distributed imple-mentation of fuzzy decision trees based on MapReduce paradigm that classifies large amount of data. This generates fuzzy partitions on each continuous attribute in a distributive manner and these partitions are given as input to the distributed FDT learning step that generates the final fuzzy decision tree model. Gini Index is used as the criteria for selecting attributes on each decision tree node leading to less computation thereby reducing time. The distributed implementation of Fuzzy decision tree is compared with normal fuzzy decision trees and classical decision trees based on accuracy and time complexity. The result of comparison shows that the distributed fuzzy decision trees outperform both classical and fuzzy decision trees by showing greater accuracy and less time.
Keywords: Big Data, Classification, RDD, Fuzzy Decision Trees, Gini Index, Fuzzy Partitions.
Suggested Citation: Suggested Citation