puc-header

iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes

37 Pages Posted: 2 Mar 2020 Sneak Peek Status: Review Complete

See all articles by Hao Lv

Hao Lv

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Fu-Ying Dao

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Dan Zhang

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Zheng-Xing Guan

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Hui Yang

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Wei Su

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Meng-Lu Liu

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Hui Ding

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Wei Chen

North China University of Science and Technology - Center for Genomics and Computational Biology

Hao Lin

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

More...

Abstract

DNA 5-hydroxymethylcytosine (5hmC), N6-methyladenine (6mA) and N4-methylcytosine (4mC) are three common kinds of DNA modifications and involve in various of biological processes. Accurate genome-wide identification of 5hmC, 6mA and 4mC sites is invaluable for better understanding their biological functions. Due to the labor-intensive and expensive nature of experimental methods for the genome-wide detection of 5hmC, 6mA and 4mC, it is urgent to develop computational methods for this aim. Keeping this in mind, the current study was devoted to construct a machine learning-based method to identify 5hmC, 6mA and 4mC in multiple species. We initially proposed using K-tuple nucleotide frequency component, nucleotide chemical property and nucleotide frequency, and mono-nucleotide binary encoding scheme to formulate positive and negative samples. Subsequently, the Random Forest was utilized to perform the identification of 5hmC, 6mA and 4mC sites. Results of five-fold cross-validation test and independent dataset test showed that the proposed method could produce the excellent generalization ability, suggesting that our proposed method is good at identifying 5hmC, 6mA and 4mC sites. For the convenience of retrieving 5hmC, 6mA and 4mC sites, a web-server called iDNA-MS was established for the proposed method, which is freely accessible at http://lin-group.cn/server/iDNA-MS.

Suggested Citation

Lv, Hao and Dao, Fu-Ying and Zhang, Dan and Guan, Zheng-Xing and Yang, Hui and Su, Wei and Liu, Meng-Lu and Ding, Hui and Chen, Wei and Lin, Hao, iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes. Available at SSRN: https://ssrn.com/abstract=3543840 or http://dx.doi.org/10.2139/ssrn.3543840
This is a paper under consideration at Cell Press and has not been peer-reviewed.

Hao Lv

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Fu-Ying Dao

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Dan Zhang

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Zheng-Xing Guan

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Hui Yang

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Wei Su

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Meng-Lu Liu

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Hui Ding

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE)

Chengdu
China

Wei Chen

North China University of Science and Technology - Center for Genomics and Computational Biology

Tangshan, 063000
China

Hao Lin (Contact Author)

University of Electronic Science and Technology of China (UESTC) - Key Laboratory for Neuro-Information (MOE) ( email )

Chengdu
China

Click here to go to Cell.com

Paper statistics

Abstract Views
96
Downloads
4