Cluster-Based Text Mining for Extracting Drug Candidates for the Prevention of COVID-19 from Biomedical Literature
31 Pages Posted: 20 Apr 2022
Abstract
Background and Objective: The COVID-19 health crisis that began to hit at the end of 2019 made researchers from around the world quickly race to find effective solutions to date. Related literature exploded and it was inevitable that an automated approach was needed to find useful information, namely text mining, to overcome COVID-19, especially in terms of drug candidate discovery. While text mining methods for finding drug candidates mostly try to extract bioentity associations from PubMed, very few of them mine with a clustering approach.
Methods: This research was conducted with four main stages. First, the text mining stage is carried out by involving BioBERT to get vector representation of each word in the sentence from texts. The next stage is generating the disease-drug associations which is obtained from the correspondence between disease and drug. Next, the clustering stage is grouping the rules through the similarity of diseases by utilizing TF-IDF as its features. Finally, the drug candidate extraction stage is processed that leveraging PubChem and DrugBank databases. We further use the drug docking package AUTODOCK VINA in the PyRx software to verify the results.
Results: Comparative analysis conducted shows that the percentage of findings used mining with clustering outperforms mining without clustering in all experimental settings. In addition, we suggested that the top three drugs/phytochemicals by drug docking analysis may be effective in preventing the new coronavirus.
Conclusions: The proposed method for text mining by applying the clustering method is quite promising in the discovery of preventing COVID-19 drug candidates through biomedical literature.
Note:
Funding Information: Dr. Ahmad Afif Supianto, Vicky Zilvan, Raden Sandra Yuwana, Andria Arisal, and Dr. Hilman Ferdinandus Pardede works are supported by National Research and Innovation Agency of Indonesia. Dr. Chia-Wei Weng work is supported by Taiwan Ministry of Science and Technology (MOST) (grant number: MOST 108-2314-B040-034-MY3). Dr. Chien-Hung Huang by MOST (grant number: MOST 109-2221-E-150-036). Dr. Ka-Lok Ng work is supported by MOST (grant numbers: MOST 109-2221-E-468-013). Dr. Ka-Lok Ng work is also supported by the Asia University (grant number: ASIA-110-CMUH-12).
Conflict of Interests: None to declare.
Keywords: Coronavirus, COVID-19, SARS-Cov-2, Text Mining, Hierarchical clustering, Drug docking, Phytochemicals
Suggested Citation: Suggested Citation