Accelerating Discoveries in Medicine Using Distributed Vector Representations of Words

26 Pages Posted: 10 Oct 2023

See all articles by Matheus Berto

Matheus Berto

Federal University of São Carlos

Breno Freitas

Shopify Inc.

Carolina Scarton

University of Sheffield

João Agostinho Machado-Neto

University of São Paulo (USP)

Tiago Almeida

Federal University of São Carlos

Date Written: September 26, 2023

Abstract

Over the years, several neural network architectures have been proposed to process and represent texts using dense vectors (known as word embeddings): mathematical representations that encode the meaning of words or phrases. Word embeddings can be computed by many different algorithms, usually trained on large amounts of textual data aiming to capture semantic relationships between words. These embeddings revolutionized many Natural Language Processing applications, enabling more accurate and nuanced language understanding. Recently, it was demonstrated that it is possible to employ word embeddings to uncover latent knowledge, i.e., information that may be implicit in a set of texts and that would hardly be perceptible to humans. In this context, this study extends such a strategy by combining different unsupervised models to accelerate discoveries in medicine. Our word embeddings were trained on a large corpus of medical papers related to Acute Myeloid Leukemia, a highly malignant form of cancer, and our study shows that established therapies could have been developed before their first proposal, due to treatment testing notifications issued by our system up to 11 years in advance. The results prove the possibility to uncover latent knowledge from the biomedical field to empower faster and efficient drug testing for medical discoveries.

Note:
Funding declaration: This work was supported by the Brazilian agencies FAPESP (grant 2021/13054-8), CAPES, and CNPq.

Conflict of Interests: We have no conflict of interest

Keywords: Distributed vector representations, Word embeddings, Knowledge discovery in databases, natural language processing, AI in medicine

Suggested Citation

Berto, Matheus and Freitas, Breno and Scarton, Carolina and Machado-Neto, João Agostinho and Almeida, Tiago, Accelerating Discoveries in Medicine Using Distributed Vector Representations of Words (September 26, 2023). Available at SSRN: https://ssrn.com/abstract=4587626

Matheus Berto

Federal University of São Carlos ( email )

Breno Freitas

Shopify Inc. ( email )

Carolina Scarton

University of Sheffield ( email )

João Agostinho Machado-Neto

University of São Paulo (USP) ( email )

Tiago Almeida (Contact Author)

Federal University of São Carlos ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
35
Abstract Views
145
PlumX Metrics