lancet-header

Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.

Development and Testing of Retrieval Augmented Generation in Large Language Models

27 Pages Posted: 9 Feb 2024

See all articles by Yu He Ke

Yu He Ke

National University of Singapore (NUS) - Singapore General Hospital

Liyuan Jin

National University of Singapore (NUS) - Duke-NUS Medical School

Kabilan Elangovan

Singapore National Eye Centre

Hairil Abdullah

National University of Singapore (NUS) - Singapore General Hospital

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine

Alex Tiong Heng Sia

National University of Singapore (NUS) - Duke-NUS Medical School

Chai Rick Soh

National University of Singapore (NUS) - Singapore General Hospital

Joshua Yi Min Tung

National University of Singapore (NUS) - Singapore General Hospital

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital

Daniel Shu Wei Ting

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology

More...

Abstract

Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Yet, their practical implementation often falls short in incorporating current, guideline-grounded knowledge specific to clinical specialties and tasks. Additionally, conventional accuracy-enhancing methods like fine-tuning pose considerable computational challenges.  

Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs, particularly well-suited for needs in healthcare implementations. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine. The accuracy and safety of the responses generated by the LLM-RAG system were evaluated as primary endpoints. 

Methods: We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses, with a total of 1260 responses evaluated (336 human-generated, 336 LLM-generated, and 588 LLM-RAG-generated).  

The RAG process involved converting clinical documents into text using Python-based frameworks like LangChain and Llamaindex, and processing these texts into chunks for embedding and retrieval. Vector storage techniques and selected embedding models to optimize data retrieval, using Pinecone for vector storage with a dimensionality of 1536 and cosine similarity for loss metrics. LLMs including GPT3.5, GPT4.0, Llama2-7B, llama2-13B, and their LLM-RAG counterparts were evaluated.  

We evaluated the system using 14 de-identified clinical scenarios, focusing on six key aspects of preoperative instructions. The correctness of the responses was determined based on established guidelines and expert panel review. Human-generated answers, provided by junior doctors, were used as a comparison. Comparative analysis was conducted using Cohen’s H and chi-square tests. 

Results: The LLM-RAG model generated answers within an average time of 15-20 seconds, significantly faster than the 10 minutes typically required by humans. Among the basic LLMs, GPT4.0 exhibited the best accuracy of 80.1%. This accuracy was further increased to 91.4% when the model was enhanced with RAG. Compared to the human-generated instructions, which had an accuracy of 86.3%, the performance of the GPT4.0-RAG model demonstrated non-inferiority (p=0.610). 

Conclusions: In this case study, we demonstrated a LLM-RAG model for healthcare implementation. The model can generate complex preoperative instructions across different clinical tasks with accuracy non-inferior to humans and low rates of hallucination. The pipeline shows the advantages of grounded knowledge, upgradability, and scalability as important aspects of healthcare LLM deployment.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Declaration of Interest: None to declare.

Keywords: Large language model, artificial intelligence, Retrieval-augmented generation

Suggested Citation

Ke, Yu He and Jin, Liyuan and Elangovan, Kabilan and Abdullah, Hairil and Liu, Nan and Sia, Alex Tiong Heng and Soh, Chai Rick and Tung, Joshua Yi Min and Ong, Jasmine Chiat Ling and Ting, Daniel Shu Wei, Development and Testing of Retrieval Augmented Generation in Large Language Models. Available at SSRN: https://ssrn.com/abstract=4719185 or http://dx.doi.org/10.2139/ssrn.4719185

Yu He Ke

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Liyuan Jin

National University of Singapore (NUS) - Duke-NUS Medical School ( email )

Kabilan Elangovan

Singapore National Eye Centre ( email )

Singapore

Hairil Abdullah

National University of Singapore (NUS) - Singapore General Hospital

Singapore, 169608
Singapore

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine ( email )

8 College Rd.
Singapore, 169857
Singapore
+65 6601 6503 (Phone)

Alex Tiong Heng Sia

National University of Singapore (NUS) - Duke-NUS Medical School ( email )

Chai Rick Soh

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Joshua Yi Min Tung

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Daniel Shu Wei Ting (Contact Author)

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology ( email )

Singapore

Click here to go to TheLancet.com

Paper statistics

Downloads
264
Abstract Views
789
PlumX Metrics