Gene Expression Analysis Using Clustering Techniques and Evaluation Indices

6 Pages Posted: 11 Apr 2019 Last revised: 25 Aug 2019

See all articles by Anand Bihari

Anand Bihari

National Institute of Technology (NIT), Patna; VIT University, Vellore - School of Information Technology and Engineering

Sudhakar Tripathi

Rajkiya Engineering College

Akshay Deepak

National Institute of Technology (NIT), Patna - Department of Computer Science and Engineering

Date Written: March 11, 2019

Abstract

Data Mining refers to as the nontrivial process of deriving and identifying valid, novel, potentially useful and ultimately understandable pattern in data. Data mining can be classified into various models such as Clustering, Decision trees, Association rules, and Sequential pattern and time series. In this work, more emphasis is given on clustering technique to analyses Genetic Expression data under Bioinformatics approach. Innovative technologies like DNA Microarray methodology in experimental molecular biology, has produced huge amounts of valuable data in the profile of gene expression. It is now possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Constant upsurge of experimental data has produced new challenges in terms of maintenance, storage and analysis to derive meaningful patterns. Many clustering algorithms have been proposed for analysis of the gene expression data. However, the evaluation of feasible and applicable clustering algorithms is becoming an important issue in current bioinformatics research. In this article, four clustering algorithms (K-Means, Hierarchical Clustering, Self-Organizing map (SOM) and DBSCAN) have been studied on Iris flower gene expression datasets. The clustering efficiency of each algorithm is accessed by various external and internal clustering evaluation indices. The results generated from this work were further analyzed by plotting graphs and charts across different algorithms, different indices and datasets to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.

Keywords: Clustering, Gene Expression Data, Data Mining

Suggested Citation

Bihari, Anand and Tripathi, Sudhakar and Deepak, Akshay, Gene Expression Analysis Using Clustering Techniques and Evaluation Indices (March 11, 2019). Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE) 2019, Available at SSRN: https://ssrn.com/abstract=3350332 or http://dx.doi.org/10.2139/ssrn.3350332

Anand Bihari (Contact Author)

National Institute of Technology (NIT), Patna ( email )

Patna
India

VIT University, Vellore - School of Information Technology and Engineering ( email )

Gorbachev Rd
Vellore, Tamil Nadu 632014
India

Sudhakar Tripathi

Rajkiya Engineering College ( email )

Ambedkar Nagar
India

Akshay Deepak

National Institute of Technology (NIT), Patna - Department of Computer Science and Engineering ( email )

Ashok Rajpath, Mahendru
Patna, Bihar 800005
India

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
33
Abstract Views
261
PlumX Metrics