puc-header

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models – Safety, Consensus & Context, Objectivity, Reproducibility and Explainability

22 Pages Posted: 22 Nov 2024 Publication Status: Under Review

See all articles by Ting Fang Tan

Ting Fang Tan

Singapore National Eye Centre

Kabilan Elangovan

Singapore National Eye Centre; SingHealth Group

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital

Aaron Lee

University of Washington

Nigam H. Shah

Stanford University - Center for Biomedical Informatics Research

Joseph J. Y. Sung

Nanyang Technological University (NTU) - Lee Kong Chian School of Medicine

Tien Yin Wong

Singapore National Eye Centre - Singapore Eye Research Institute

Xue Lan

Tsinghua University

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine

Haibo Wang

Sun Yat-Sen University (SYSU), First Affiliated Hospital, Clinical Trial Unit

Chang-Fu Kuo

Chang Gung University - Chang Gung Memorial Hospital

Simon Chesterman

National University of Singapore (NUS) - Faculty of Law

Zee Kin Yeong

Singapore Academy of Law

Daniel Shu Wei Ting

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology

More...

Abstract

A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare expands beyond just accuracy and traditional quantitative metrics is needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus & Context, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications.

Keywords: large language models, healthcare, evaluation, generative artificial intelligence

Suggested Citation

Tan, Ting Fang and Elangovan, Kabilan and Ong, Jasmine Chiat Ling and Lee, Aaron and Shah, Nigam H. and Sung, Joseph J. Y. and Wong, Tien Yin and Lan, Xue and Liu, Nan and Wang, Haibo and Kuo, Chang-Fu and Chesterman, Simon and Yeong, Zee Kin and Ting, Daniel Shu Wei and Administrator, Sneak Peek, A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models – Safety, Consensus & Context, Objectivity, Reproducibility and Explainability. Available at SSRN: https://ssrn.com/abstract=5029562 or http://dx.doi.org/10.2139/ssrn.5029562
This version of the paper has not been formally peer reviewed.

Ting Fang Tan

Singapore National Eye Centre ( email )

Singapore

Kabilan Elangovan

Singapore National Eye Centre ( email )

SingHealth Group ( email )

Jasmine Chiat Ling Ong

National University of Singapore (NUS) - Singapore General Hospital ( email )

Singapore, 169608
Singapore

Aaron Lee

University of Washington ( email )

Seattle, WA 98195
United States

Nigam H. Shah

Stanford University - Center for Biomedical Informatics Research ( email )

1265 Welch Road
Stanford, CA
United States

Joseph J. Y. Sung

Nanyang Technological University (NTU) - Lee Kong Chian School of Medicine ( email )

Singapore

Tien Yin Wong

Singapore National Eye Centre - Singapore Eye Research Institute ( email )

168751
Singapore

Xue Lan

Tsinghua University ( email )

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine ( email )

8 College Rd.
Singapore, 169857
Singapore
+65 6601 6503 (Phone)

Haibo Wang

Sun Yat-Sen University (SYSU), First Affiliated Hospital, Clinical Trial Unit ( email )

China

Chang-Fu Kuo

Chang Gung University - Chang Gung Memorial Hospital

Taiwan

Simon Chesterman

National University of Singapore (NUS) - Faculty of Law ( email )

469G Bukit Timah Road
Eu Tong Sen Building
Singapore, 259776
Singapore

HOME PAGE: www.SimonChesterman.com

Zee Kin Yeong

Singapore Academy of Law ( email )

3 St. Andrew's Road
Third Level, City Hall
City Hall 178958, 178958
Singapore

Daniel Shu Wei Ting (Contact Author)

Singapore National Eye Centre - Artificial Intelligence for Ophthalmology ( email )

Singapore

Click here to go to Cell.com

Paper statistics

Downloads
10
Abstract Views
148
PlumX Metrics