Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond

Xie, Qianqian; Chen, Qingyu; Chen, Aokun; Peng, Cheng; Hu, Yan; Lin, Fongci; Peng, Xueqing; Huang, Jimin; Zhang, Jeffrey; Keloth, Vipina  K.; Zhou, Xinyu; Qian, Lingfei; He, Huan; Shung, Dennis; Ohno-Machado, Lucila; Wu, Yonghui; Xu, Hua; Bian, Jiang

doi:10.2139/ssrn.4943761

Download This Paper

Open PDF in Browser

Add Paper to My Library

Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.

Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond

18 Pages Posted: 4 Sep 2024

See all articles by Qianqian Xie

Qianqian Xie

Yale University

Qingyu Chen

Yale University

Aokun Chen

University of Florida

Cheng Peng

University of Florida

Yan Hu

University of Texas at Houston - Health Science Center at Houston (UTHealth)

Fongci Lin

Yale University

Xueqing Peng

Yale University

Jimin Huang

Yale University

Jeffrey Zhang

Yale University

Vipina K. Keloth

Yale University

Xinyu Zhou

Yale University; Fudan University - School of Public Health

Lingfei Qian

Yale University

Huan He

Yale University

Dennis Shung

Yale University - Department of Medicine

Lucila Ohno-Machado

Yale University

Yonghui Wu

University of Florida

Hua Xu

Yale University

Jiang Bian

University of Florida

More...

Abstract

Background: Recent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown promise in revolutionizing medical applications, though their performance in medical language understanding still requires enhancement. This study aims to develop foundational medical LLMs by training open-source LLaMA models with large-scale, domain-specific datasets to enhance their efficacy across a variety of medical text analysis tasks and a medical diagnosis task.

Methods: We developed Me-LLaMA, a new medical LLM family that includes foundation models – Me-LLaMA 13/70B, and their chat-enhanced versions – Me-LLaMA 13/70B-chat, through continual pre-training and instruction tuning of LLaMA2 using both biomedical literature and clinical notes. Me-LLaMA utilized the largest and most comprehensive medical data, including 129B pre-training tokens and 214K instruction tuning samples from diverse biomedical and clinical data sources, and it took substantial computing resources, e.g., over 100,000 A100 GPU hours for training the 70B models. We then applied Me-LLaMA to six important biomedical text analysis tasks (Question Answering, Named Entity Recognition, Relation Extraction, Text Classification, Text Summarization, and Natural Language Inference) and evaluated its performance on 12 benchmark datasets. To further assess Me-LLaMA’s potential clinical utility, we also evaluated Me-LLaMA models on the complex clinical case diagnosis task and compared its performance with other commercial LLMs, using both automatic and human evaluation.

Findings: Our extensive evaluation shows that Me-LLaMA models outperform LLaMA, as well as other existing open-source medical LLMs in both zero-shot and supervised learning settings for most of text analysis tasks. With task-specific instruction tuning, Me-LLaMA models also surpass leading commercial LLMs, including ChatGPT, on 7 out of 8 datasets, and GPT-4 on 5 out of 8 datasets. Moreover, for the task of diagnosing complex clinical cases, Me-LLaMA’s performance is comparable to ChatGPT and GPT-4.

Interpretation: Domain-specific data is important for building medical foundation LLMs that can improve diverse downstream text analysis tasks and medical applications. Computing costs associated with training medical foundation models are substantial and require careful considerations when selecting different training strategies (i.e., pre-training vs. fine tuning). Me-LLaMA models are now publicly available through appropriate user agreements, making it a valuable resource for medical AI applications.

Funding: National Institutes of Health (NIH); Patient-Centered Outcomes Research Institute (PCORI).

Declaration of Interest: The authors have no financial or non-financial conflicts of interest to disclose.

Keywords: medical large language models, medical text analysis, clinical diagnosis reasoning

Suggested Citation: Suggested Citation

Xie, Qianqian and Chen, Qingyu and Chen, Aokun and Peng, Cheng and Hu, Yan and Lin, Fongci and Peng, Xueqing and Huang, Jimin and Zhang, Jeffrey and Keloth, Vipina K. and Zhou, Xinyu and Qian, Lingfei and He, Huan and Shung, Dennis and Ohno-Machado, Lucila and Wu, Yonghui and Xu, Hua and Bian, Jiang, Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond. Available at SSRN: https://ssrn.com/abstract=4943761 or http://dx.doi.org/10.2139/ssrn.4943761