Gene Expression Analysis Using Clustering Techniques and Evaluation Indices
6 Pages Posted: 11 Apr 2019 Last revised: 25 Aug 2019
Date Written: March 11, 2019
Data Mining refers to as the nontrivial process of deriving and identifying valid, novel, potentially useful and ultimately understandable pattern in data. Data mining can be classified into various models such as Clustering, Decision trees, Association rules, and Sequential pattern and time series. In this work, more emphasis is given on clustering technique to analyses Genetic Expression data under Bioinformatics approach. Innovative technologies like DNA Microarray methodology in experimental molecular biology, has produced huge amounts of valuable data in the profile of gene expression. It is now possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Constant upsurge of experimental data has produced new challenges in terms of maintenance, storage and analysis to derive meaningful patterns. Many clustering algorithms have been proposed for analysis of the gene expression data. However, the evaluation of feasible and applicable clustering algorithms is becoming an important issue in current bioinformatics research. In this article, four clustering algorithms (K-Means, Hierarchical Clustering, Self-Organizing map (SOM) and DBSCAN) have been studied on Iris flower gene expression datasets. The clustering efficiency of each algorithm is accessed by various external and internal clustering evaluation indices. The results generated from this work were further analyzed by plotting graphs and charts across different algorithms, different indices and datasets to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.
Keywords: Clustering, Gene Expression Data, Data Mining
Suggested Citation: Suggested Citation