lancet-header

Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.

Evaluation of LLMs Accuracy and Application in Oncology Principles and Practice

20 Pages Posted: 10 Mar 2025

See all articles by Jinghao Liang

Jinghao Liang

Guangzhou Medical University - Guangzhou Institute of Respiratory Health

Yijian Lin

Guangdong Medical University

Zhihua Guo

Guangzhou Medical University - Guangzhou Institute of Respiratory Health

Hengrui Liang

Guangzhou Medical University - State Key Laboratory of Respiratory Disease

Jingchun Ni

Guangdong Medical University

Dianhan Lin

Shantou University

Jihao Qi

Guangdong Medical University

Zishan Huang

Guangzhou Medical University

Wei Wang

Guangzhou Medical University - Department of Thoracic Surgery and Oncology

Jianxing He

Guangzhou Medical University - Department of Thoracic Surgery and Oncology

More...

Abstract

Background: In recent years, large language models (LLMs) have offered physicians and patients a new avenue for tumor diagnosis and treatment, showcasing distinctive potential. Our study assessed 16 large language models (LLMs), including ChatGPT, DeepSeek, Claude, Grok and Llama, with particular focus on their diagnostic precision and answer comprehensibility in oncology-related inquiries, while investigating performance variations among these models.

Methods: We developed 549 single‐choice/true‐false and 10 short‐answer questions to evaluate clinical oncology knowledge based on standard textbooks, guidelines, and literature. Our study enrolled five participant groups for diagnostic and treatment testing in oncology and thoracic oncology, including attending physicians,resident physicians, academic professionals, the general public, and LLMs. We applied sixteen generative LLMs adopting oncologist personas to answer the questions independently. Readability was measured with the Flesch Reading Ease Score (FRES). Three consultant‐level oncology specialists independently rated the LLMs' responses to the short‐answer questions on a 3‐point accuracy scale.

Findings: The study revealed GPT o1, DeepSeek-R1, and GPT o3-mini achieved top accuracy (90.16%-89.44%) in an overall accuracy evaluation, while Llama-3.2-1B performed lowest at 32.42%. GPT o1 demonstrated highest accuracy in management of tumor complications and emergencies and hematologic tumor (97.50%, 94.44%), while DeepSeek-R1 excelled in molecular biology of cancer (94.44%) and achieved perfect accuracy (100%)​in cancer pain management. LLMs group outperformed all other groups with an accuracy of 89.39% in comprehensive test and achieved comparable performance to attending physicians in thoracic oncology testing (91.6% vs. 89.4%, p-value = 0.5887). The Flesch Reading Ease Score (FRES) evaluation revealed that DeepSeek-R1 had the highest response readability. Additionally,Grok3 and DeepSeek R1 outperformed other models in quality of response, garnering 50% “excellent” ratings.

Interpretation: These results guide oncology-specific LLM selection through clinical evaluation of capabilities and limitations, with future refinements enhancing utility in optimizing cancer care efficiency and accuracy.FundingNone.

Keywords: Large Language Models, Oncology, Diagnostic Accuracy, Answer Readability

Suggested Citation

Liang, Jinghao and Lin, Yijian and Guo, Zhihua and Liang, Hengrui and Ni, Jingchun and Lin, Dianhan and Qi, Jihao and Huang, Zishan and Wang, Wei and He, Jianxing, Evaluation of LLMs Accuracy and Application in Oncology Principles and Practice. Available at SSRN: https://ssrn.com/abstract=5169443 or http://dx.doi.org/10.2139/ssrn.5169443

Jinghao Liang

Guangzhou Medical University - Guangzhou Institute of Respiratory Health ( email )

Yijian Lin

Guangdong Medical University ( email )

620 Renmin N Rd, Yuexiu Qu
Guangzhou Shi
Guangdong Sheng, 510000
China

Zhihua Guo

Guangzhou Medical University - Guangzhou Institute of Respiratory Health ( email )

Hengrui Liang

Guangzhou Medical University - State Key Laboratory of Respiratory Disease ( email )

Guangdong, 510120
China

Jingchun Ni

Guangdong Medical University ( email )

620 Renmin N Rd, Yuexiu Qu
Guangzhou Shi
Guangdong Sheng, 510000
China

Dianhan Lin

Shantou University ( email )

243 Daxue Road
Shantou, Guangdong, 515063
China

Jihao Qi

Guangdong Medical University ( email )

620 Renmin N Rd, Yuexiu Qu
Guangzhou Shi
Guangdong Sheng, 510000
China

Zishan Huang

Guangzhou Medical University ( email )

195 Dongfeng W Rd
Yuexiu Qu
Guangzhou Shi, 510080
China

Wei Wang

Guangzhou Medical University - Department of Thoracic Surgery and Oncology ( email )

Jianxing He (Contact Author)

Guangzhou Medical University - Department of Thoracic Surgery and Oncology ( email )

Guangzhou, 510120
China

Click here to go to TheLancet.com

Paper statistics

Downloads
173
Abstract Views
697
PlumX Metrics