Prompt Design for Medical Question Answering with Large Language Models

Kuligin, Leonid; Lammert, Jacqueline; Ostapenko, Aleksandr; Tschochohei, Maximilian; Boeker, Martin; Bressem, Keno

doi:10.2139/ssrn.5165960

Download This Paper

Open PDF in Browser

Add Paper to My Library

Prompt Design for Medical Question Answering with Large Language Models

11 Pages Posted: 8 Mar 2025

See all articles by Leonid Kuligin

Large language models (LLMs) are increasingly being evaluated in the medical domain. Given the lack of datasets and difficulties evaluating outputs represented by free text, datasets with multiple-choice questions are often used for such studies. We evaluated six large LLMs (belonging to LLM families such as Claude 3.5 Sonnet, Gemini 1.5-pro, Llama 3.1, Mistral) and six smaller models(originating from families such as Gemma 2B, Mistral Nemo, Llama 3.1, Gemini 1.5-flash) across five prompting techniques on neuro-oncology exam questions. Using the established MedQA datasetand a novel neuro-oncology question set, we compared basic prompting, chain-of-thought reasoning, and more complex agent-based methods incorporating external search capabilities. Results showedthat the Reasoning and Acting (ReAct) approach combined with giving LLM access to Google Search performed best on large models like Claude 3.5 Sonnet (81.7% accuracy). However, the performanceof prompting techniques varies across different foundational models. While large models significantly outperformed smaller open-source ones on the MedQA dataset (79.3% vs 51.2% accuracy), complexagentic patterns like Language Agent Tree Search provided minimal benefits despite 5x higher latency. We recommend practitioners keep experimenting with various techniques given their specific usecase and a chosen foundational model and favor simple prompting patterns with large models, as they offer the best balance of accuracy and efficiency.

Note:
Funding Information: Funding agency and/or the grant number: We haven't used any external funding or grants

Conflict of Interests: All listed authors must declare any competing interests: Authors do not have any competing interests.

Keywords: Large Language Models, Generative AI, Medical Question-Answering, Agentic AI

Suggested Citation: Suggested Citation

Kuligin, Leonid and Lammert, Jacqueline and Ostapenko, Aleksandr and Tschochohei, Maximilian and Boeker, Martin and Bressem, Keno, Prompt Design for Medical Question Answering with Large Language Models. Available at SSRN: https://ssrn.com/abstract=5165960 or http://dx.doi.org/10.2139/ssrn.5165960