Cluster-Based Text Mining for Extracting Drug Candidates for the Prevention of COVID-19 from Biomedical Literature

31 Pages Posted: 20 Apr 2022

See all articles by Ahmad Afif Supianto Supianto

Ahmad Afif Supianto Supianto

National Research and Innovation Agency (BRIN)

Rizky Nurdiansyah

affiliation not provided to SSRN

Chia-Wei Weng

Chung Shan Medical University

Vicky Zilvan

National Research and Innovation Agency (BRIN)

Raden Sandra Yuwana

National Research and Innovation Agency (BRIN)

Andria Arisal

National Research and Innovation Agency (BRIN)

Hilman Ferdinandus Pardede

National Research and Innovation Agency (BRIN)

Min-Min Lee

Asia University

Chien-Hung Huang

National Formosa University

Ka-Lok Ng

Asia University

Abstract

Background and Objective: The COVID-19 health crisis that began to hit at the end of 2019 made researchers from around the world quickly race to find effective solutions to date. Related literature exploded and it was inevitable that an automated approach was needed to find useful information, namely text mining, to overcome COVID-19, especially in terms of drug candidate discovery. While text mining methods for finding drug candidates mostly try to extract bioentity associations from PubMed, very few of them mine with a clustering approach.

Methods: This research was conducted with four main stages. First, the text mining stage is carried out by involving BioBERT to get vector representation of each word in the sentence from texts. The next stage is generating the disease-drug associations which is obtained from the correspondence between disease and drug. Next, the clustering stage is grouping the rules through the similarity of diseases by utilizing TF-IDF as its features. Finally, the drug candidate extraction stage is processed that leveraging PubChem and DrugBank databases. We further use the drug docking package AUTODOCK VINA in the PyRx software to verify the results.

Results: Comparative analysis conducted shows that the percentage of findings used mining with clustering outperforms mining without clustering in all experimental settings. In addition, we suggested that the top three drugs/phytochemicals by drug docking analysis may be effective in preventing the new coronavirus.

Conclusions: The proposed method for text mining by applying the clustering method is quite promising in the discovery of preventing COVID-19 drug candidates through biomedical literature.

Note:
Funding Information: Dr. Ahmad Afif Supianto, Vicky Zilvan, Raden Sandra Yuwana, Andria Arisal, and Dr. Hilman Ferdinandus Pardede works are supported by National Research and Innovation Agency of Indonesia. Dr. Chia-Wei Weng work is supported by Taiwan Ministry of Science and Technology (MOST) (grant number: MOST 108-2314-B040-034-MY3). Dr. Chien-Hung Huang by MOST (grant number: MOST 109-2221-E-150-036). Dr. Ka-Lok Ng work is supported by MOST (grant numbers: MOST 109-2221-E-468-013). Dr. Ka-Lok Ng work is also supported by the Asia University (grant number: ASIA-110-CMUH-12).

Conflict of Interests: None to declare.

Keywords: Coronavirus, COVID-19, SARS-Cov-2, Text Mining, Hierarchical clustering, Drug docking, Phytochemicals

Suggested Citation

Supianto, Ahmad Afif Supianto and Nurdiansyah, Rizky and Weng, Chia-Wei and Zilvan, Vicky and Yuwana, Raden Sandra and Arisal, Andria and Pardede, Hilman Ferdinandus and Lee, Min-Min and Huang, Chien-Hung and Ng, Ka-Lok, Cluster-Based Text Mining for Extracting Drug Candidates for the Prevention of COVID-19 from Biomedical Literature. Available at SSRN: https://ssrn.com/abstract=4088406 or http://dx.doi.org/10.2139/ssrn.4088406

Ahmad Afif Supianto Supianto

National Research and Innovation Agency (BRIN) ( email )

Tangerang Selatan, 15314
Indonesia

Rizky Nurdiansyah

affiliation not provided to SSRN ( email )

No Address Available

Chia-Wei Weng

Chung Shan Medical University ( email )

No. 110號, Section 1
Jianguo N Rd, South District
Taichung City, 40201
Taiwan

Vicky Zilvan

National Research and Innovation Agency (BRIN) ( email )

Tangerang Selatan, 15314
Indonesia

Raden Sandra Yuwana

National Research and Innovation Agency (BRIN) ( email )

Tangerang Selatan, 15314
Indonesia

Andria Arisal

National Research and Innovation Agency (BRIN) ( email )

Tangerang Selatan, 15314
Indonesia

Hilman Ferdinandus Pardede

National Research and Innovation Agency (BRIN) ( email )

Tangerang Selatan, 15314
Indonesia

Min-Min Lee

Asia University ( email )

Tokyo 180-8629
Japan

Chien-Hung Huang

National Formosa University ( email )

64 Wenhwa Road
Yunlin 632, Taiwan, ROC
Taiwan

Ka-Lok Ng (Contact Author)

Asia University ( email )

Taichung
Taiwan

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
53
Abstract Views
279
Rank
761,401
PlumX Metrics