Protein Function Prediction: Combining Statistical Features with Deep Learning
4 Pages Posted: 11 Apr 2019
Date Written: February 8, 2019
Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.
Keywords: Pseudo Amino Acid Composition (PseAAC), Amino Acid Index Distribution (AAID), Sequence Graph Transform (SGT), Deep Neural Network (DNN)
Suggested Citation: Suggested Citation