lancet-header

Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.

Retrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

45 Pages Posted: 2 Jul 2024

See all articles by Yu He Ke

Yu He Ke

National University of Singapore (NUS) - Singapore General Hospital

Liyuan Jin

National University of Singapore (NUS) - Duke-NUS Medical School

Kabilan Elangovan

Singapore National Eye Centre

Hairil Abdullah

National University of Singapore (NUS) - Singapore General Hospital

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine

Alex Tiong Heng Sia

National University of Singapore (NUS) - Duke-NUS Medical School

Chai Rick Soh

National University of Singapore (NUS) - Singapore General Hospital

Joshua Yi Min Tung

National University of Singapore (NUS) - Singapore General Hospital

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital

Chang-Fu Kuo

Chang Gung University - Chang Gung Memorial Hospital

Shao-Chun Wu

Chang Gung University - Chang Gung Memorial Hospital

Vesela P. Kovacheva

Harvard University - Harvard Medical School

Daniel Shu Wei Ting

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology

More...

Abstract

Purpose: Large Language Models (LLMs) offer potential for medical applications, but often lack the specialized knowledge needed for clinical tasks. Retrieval Augmented Generation (RAG) is a promising approach, allowing for the customization of LLMs with domain-specific knowledge, well-suited for healthcare. We focused on assessing the accuracy, consistency and safety of RAG models in determining a patient’s fitness for surgery and providing additional crucial preoperative instructions. 

Methods: We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses, with a total of 3682 responses evaluated. Clinical documents were processed, stored, and retrieved using Llamaindex. Ten LLMs (GPT3.5, GPT4, GPT4-o, Llama2-7B, Llama2-13B, LLama2-70b, LLama3-8b, LLama3-70b, Gemini-1.5-Pro and Claude-3-Opus) were evaluated with 1) native model, 2) with local and 3) international preoperative guidelines. Fourteen clinical scenarios were assessed, focusing on 7 aspects of preoperative instructions. Established guidelines and expert physician judgment determined correct responses. Human-generated answers from senior attending anesthesiologists and junior doctors served as a comparison. Comparative analysis was conducted using Fisher’s exact test and agreement for inter-rater agreement within human and LLM responses. 

Results: The LLM-RAG model demonstrated good efficiency, generating answers within 20 seconds, with guideline retrieval taking less than 5 seconds. This performance is faster than the 10 minutes typically estimated by clinicians. Notably, the LLM-RAG model utilizing GPT4 achieved the highest accuracy in assessing fitness for surgery, surpassing human-generated responses (96.4% vs. 86.6%, p=0.016). The RAG models demonstrated generalizable performance, exhibiting similarly favorable outcomes with both international and local guidelines. Additionally, the GPT4 LLM-RAG model exhibited an absence of hallucinations and produced correct preoperative instructions that were comparable to those generated by clinicians. 

Conclusions: This study successfully implements LLM-RAG models for preoperative healthcare tasks, emphasizing the benefits of grounded knowledge, upgradability, and scalability for effective deployment in healthcare settings.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Declaration of Interest: None declared.

Ethical Approval: This study involved the analysis of de-identified patient data. As the study did not involve the collection, use, or disclosure of identifiable private information and since the data were not collected through interaction or intervention with individuals specifically for research purposes, it was determined that IRB oversight was not required. All data used were accessed in compliance with applicable privacy laws and institutional policies

Keywords: Large language model, Artificial intelligence, Retrieval-augmented generation

Suggested Citation

Ke, Yu He and Jin, Liyuan and Elangovan, Kabilan and Abdullah, Hairil and Liu, Nan and Sia, Alex Tiong Heng and Soh, Chai Rick and Tung, Joshua Yi Min and Ong, Jasmine Chiat Ling and Kuo, Chang-Fu and Wu, Shao-Chun and Kovacheva, Vesela P. and Ting, Daniel Shu Wei, Retrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness. Available at SSRN: https://ssrn.com/abstract=4881310 or http://dx.doi.org/10.2139/ssrn.4881310

Yu He Ke

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Liyuan Jin

National University of Singapore (NUS) - Duke-NUS Medical School ( email )

Kabilan Elangovan

Singapore National Eye Centre ( email )

Singapore

Hairil Abdullah

National University of Singapore (NUS) - Singapore General Hospital

Singapore, 169608
Singapore

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine ( email )

8 College Rd.
Singapore, 169857
Singapore
+65 6601 6503 (Phone)

Alex Tiong Heng Sia

National University of Singapore (NUS) - Duke-NUS Medical School ( email )

Chai Rick Soh

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Joshua Yi Min Tung

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Chang-Fu Kuo

Chang Gung University - Chang Gung Memorial Hospital

Taiwan

Shao-Chun Wu

Chang Gung University - Chang Gung Memorial Hospital ( email )

No.199, Tunghwa Rd
Taipei
Taiwan

Vesela P. Kovacheva

Harvard University - Harvard Medical School ( email )

25 Shattuck St
Boston, MA 02115
United States

Daniel Shu Wei Ting (Contact Author)

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology ( email )

Singapore

Click here to go to TheLancet.com

Paper statistics

Downloads
82
Abstract Views
322
PlumX Metrics