Accelerating Discoveries in Medicine Using Distributed Vector Representations of Words
26 Pages Posted: 10 Oct 2023
Date Written: September 26, 2023
Abstract
Over the years, several neural network architectures have been proposed to process and represent texts using dense vectors (known as word embeddings): mathematical representations that encode the meaning of words or phrases. Word embeddings can be computed by many different algorithms, usually trained on large amounts of textual data aiming to capture semantic relationships between words. These embeddings revolutionized many Natural Language Processing applications, enabling more accurate and nuanced language understanding. Recently, it was demonstrated that it is possible to employ word embeddings to uncover latent knowledge, i.e., information that may be implicit in a set of texts and that would hardly be perceptible to humans. In this context, this study extends such a strategy by combining different unsupervised models to accelerate discoveries in medicine. Our word embeddings were trained on a large corpus of medical papers related to Acute Myeloid Leukemia, a highly malignant form of cancer, and our study shows that established therapies could have been developed before their first proposal, due to treatment testing notifications issued by our system up to 11 years in advance. The results prove the possibility to uncover latent knowledge from the biomedical field to empower faster and efficient drug testing for medical discoveries.
Note:
Funding declaration: This work was supported by the Brazilian agencies FAPESP (grant 2021/13054-8), CAPES, and CNPq.
Conflict of Interests: We have no conflict of interest
Keywords: Distributed vector representations, Word embeddings, Knowledge discovery in databases, natural language processing, AI in medicine
Suggested Citation: Suggested Citation