Improving Clinical Query Intent Interpretation by using Concept Embedding Vectors
Posted: 22 Nov 2021
Date Written: November 22, 2021
It is necessary to retrieve accurate, succinct, updated, and trustworthy information from medical literature for search queries made by clinicians and medical professionals during the diagnosis, prognosis, or treatment of any patient at the point of care. The ever-growing volume of medical evidence, complex search queries which require additional context and patient comorbidities, and trust in search results are often cited as critical challenges towards focused clinical search. At Elsevier, we have been developing the Focused Clinical Search Service (HGFCSS), which is powered by Elsevier's Healthcare Knowledge Graph (HG), to tackle the challenges of focused clinical search and facilitate the retrieval of relevant medical information from synoptic medical content and medical textbooks. HG contains medical knowledge (concepts, relations, cohorts, etc.), regularly curated by subject matter experts, and novel clinical relations, extracted by natural language processing (NLP) models from unstructured medical content through automated pipelines.
The HGFCSS identifies the core HG medical concepts from a clinical search query as well as additional refinements proposed by the user (e.g., treatment, diagnosis) in that query. In this talk, we will present our ongoing research on the use of concept embedding vectors to improve the clinical query intent interpretation in HGFCSS. Concept embedding vectors represent HG medical concepts in a high-dimensional numerical vector space and are generated based on concept co-occurrences in medical literature. Concepts with minimal distance in the vector space are deemed to be similar to each other, even when there are no explicit typed relations in the knowledge graph. We use the concept embedding vectors to identify and select similar HG concepts from a set of parsed query concepts, adhering to certain semantic criteria and content filters, and improve query expansion strategies. Through this approach, we can improve search performance in cases where the knowledge graph has limited, trusted relations and medical information (e.g., "ABCD2 score" query will retrieve the document on "Transient Ischemic Attack", due to the similarity of the two concepts in vectorial space).
We have been evaluating the search performance of HGFCSS to retrieve relevant content excerpts for a set of focused clinical search queries by measuring the nDCG and other search metrics. With concept embedding vectors to improve clinical query intent interpretation, we were able to improve the search performance of the HGFCSS from an average nDCG score of 0.54 to 0.68 over the training set, and an average nDCG score of 0.52 to 0.63 over the test set. In the future, we are planning to extend the use of concept embedding vectors to improve query understanding and parsing over multi-concept complex clinical queries.
Keywords: Knowledge Graph, Clinical Search, Information Retrieval, Natural Language Processing, Automation, Concept Embedding
Suggested Citation: Suggested Citation