A Fusion-Driven Approach of Attention-Based Cnn-Bilstm for Protein Family Classification - Profamnet

18 Pages Posted: 13 Nov 2024

See all articles by Bahar Ali

Bahar Ali

Institute of Management Sciences

Anwar Shah

National University of Computer and Emerging Sciences (NUCES or FAST-NU)

Malik Niaz

affiliation not provided to SSRN

Musadaq Mansoor

Ghulam Ishaq Khan Institute of Engineering Sciences and Technology

Waleed S. Alnumay S. Alnumay

King Saud University

Muhammad Adnan

FAST National University of Computer and Emerging Sciences

Abstract

Advanced automated AI techniques allow us to classify protein sequences and discern their biological families and functions. Conventional approaches for classifying these protein families often focus on extracting N-Gram features from the sequences while overlooking crucial motif information and the interplay between motifs and neighboring amino acids. Recently, convolutional neural networks have been applied to amino acid and motif data, even with a limited dataset of well-characterized proteins, resulting in improved performance. This study presents a model for classifying protein families using the fusion of 1D-CNN, BiLSTM, and an attention mechanism, which combines spatial feature extraction, long-term dependencies, and context-aware representations. The proposed model (ProFamNet) achieved superior model efficiency with 450,953 parameters and a compact size of 1.72 MB, outperforming the state-of-the-art model with 4,578,911 parameters and a size of 17.47 MB. Further, we achieved a higher F1 score (98.30% vs. 97.67%) with more instances (271,160 vs. 55,077) in fewer training epochs (25 vs. 30).

Keywords: Protein Family Classification, deep learning, Convolutional neural network, Long short-term memory, Attention network

Suggested Citation

Ali, Bahar and Shah, Anwar and Niaz, Malik and Mansoor, Musadaq and Alnumay, Waleed S. Alnumay S. and Adnan, Muhammad, A Fusion-Driven Approach of Attention-Based Cnn-Bilstm for Protein Family Classification - Profamnet. Available at SSRN: https://ssrn.com/abstract=5019656 or http://dx.doi.org/10.2139/ssrn.5019656

Bahar Ali

Institute of Management Sciences ( email )

1-A, E-5, Phase V
Hayatabad
Peshawar, 25000
Pakistan

Anwar Shah (Contact Author)

National University of Computer and Emerging Sciences (NUCES or FAST-NU) ( email )

B- Block, Faisal Town
Pakistan
Lahore, Punjab 54770
Pakistan

Malik Niaz

affiliation not provided to SSRN ( email )

Musadaq Mansoor

Ghulam Ishaq Khan Institute of Engineering Sciences and Technology ( email )

Topi, Khyber Pakhtunkhwa (KP), Pakistan
Swabi
Pakistan

Waleed S. Alnumay S. Alnumay

King Saud University ( email )

P.O. Box 2460
Saudi Arabia
Riyadh, 11451
Saudi Arabia

Muhammad Adnan

FAST National University of Computer and Emerging Sciences ( email )

Melad Street Faisal Town Lahore
Firdous Market GIII Lahore
Lahore, Punjab 54600
Pakistan

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
7
Abstract Views
110
PlumX Metrics