lancet-header

Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.

Robust and Interpretable Machine Learning Assessment of Variable Importance with Moderate to Small Sample Sizes: A Study of Survival after Out-Of-Hospital Cardiac Arrest

23 Pages Posted: 20 Apr 2023

See all articles by Yilin Ning

Yilin Ning

National University of Singapore (NUS) - Duke-NUS Medical School

Siqi Li

National University of Singapore (NUS) - Duke-NUS Medical School-Centre for Quantitative Medicine

Yih Yng Ng

Tan Tock Seng Hospital - Department of Preventive and Population Medicine; Centre for Healthcare Innovation; Lee Kong Chian School of Medicine

Michael Yih-Chong Chia

Tan Tock Seng Hospital - Emergency Department

Han Nee Gan

Changi General Hospital - Accident & Emergency

Ling Tiah

Changi General Hospital - Accident & Emergency

Desmond Ren-Hao Mao

Khoo Teck Puat Hospital - Department of Acute and Emergency Care

Wei Ming Ng

National University of Singapore (NUS) - Emergency Medicine Department

Benjamin Sieu-Hon Leong

National University of Singapore (NUS) - Department of Emergency Medicine

Nausheen Edwin Doctor

Sengkang General Hospital - Department of Emergency Medicine

Marcus Eng Hock Ong

National University of Singapore (NUS) - Health Services and Systems Research; Singapore General Hospital - Department of Emergency Medicine

Nan Liu

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine

More...

Abstract

Background: There is an increasing interest to update regression-based evidence on variable importance by using advanced machine learning (ML) methods. However, findings from black-box ML methods may not align well with clinical understanding, and both ML and regression approaches have deteriorated performance with small sample sizes. We introduce an alternative method, the Shapley variable importance cloud (ShapleyVIC), that is less restricted by sample size.

Methods: ShapleyVIC integrates regression-based approach with ML techniques for interpretable and robust variable importance assessment. By analyzing an ensemble of regression models, ShapleyVIC explicitly accounts for uncertainties in variable importance to reduce biased and improve resistant to sampling variabilities over conventional inference on a single model. In a study of 30-day survival after out-of-hospital cardiac arrest (OHCA), we compared ShapleyVIC with logistic regression and two commonly used ML methods (random forest and XGBoost) for assessing variable importance from the full cohort (n=7490) and reproducing the findings using smaller subsets (n=2500 and n=500).

Findings: Both ShapleyVIC and conventional logistic regression identified important factors previously reported in the literature, but the low importance of race and moderate importance of three prehospital interventions found by ShapleyVIC was more plausible than the opposite found in the regression analysis. The random forest and XGBoost generated questionable variable rankings from the full cohort and were not applied to smaller subsets. ShapleyVIC was generally consistent in shortlisting important variables when n=2500 and n=500, whereas the logistic regression had attenuated statistical power and only consistently identified two variables when n=500.

Interpretation: ShapleyVIC is an interpretable and robust alternative to regression-based analyses and commonly used ML approaches for assessing variable importance in clinical applications with varying sample sizes.

Funding: This research received support from SingHealth Duke- NUS ACP Programme Funding (15/FY2020/P2/06-A79), National Medical Research Council, Clinician Scientist Award, Singapore (NMRC/CSA/024/2010, NMRC/CSA/0049/2013 and NMRC/CSA-SI/0014/2017) and Ministry of Health, Health Services Research Grant, Singapore (HSRG/0021/2012). YN is supported by the Khoo Postdoctoral Fellowship Award (project no. Duke-NUS-KPFA/2021/0051) from the Estate of Tan Sri Khoo Teck Puat.

Declaration of Interest: ll other authors have no conflict of interests to declare.

Ethical Approval: This study was approved by the Centralised Institutional Review Board (2013/604/C) and the Domain Specific Review Board (2013/00929).

Keywords: interpretable machine learning, variable importance, out-of-hospital cardiac arrest

Suggested Citation

Ning, Yilin and Li, Siqi and Ng, Yih Yng and Chia, Michael Yih-Chong and Gan, Han Nee and Tiah, Ling and Mao, Desmond Ren-Hao and Ng, Wei Ming and Leong, Benjamin Sieu-Hon and Doctor, Nausheen Edwin and Ong, Marcus Eng Hock and Liu, Nan, Robust and Interpretable Machine Learning Assessment of Variable Importance with Moderate to Small Sample Sizes: A Study of Survival after Out-Of-Hospital Cardiac Arrest. Available at SSRN: https://ssrn.com/abstract=4423470 or http://dx.doi.org/10.2139/ssrn.4423470

Yilin Ning

National University of Singapore (NUS) - Duke-NUS Medical School ( email )

Singapore
Singapore

Siqi Li

National University of Singapore (NUS) - Duke-NUS Medical School-Centre for Quantitative Medicine ( email )

Yih Yng Ng

Tan Tock Seng Hospital - Department of Preventive and Population Medicine ( email )

Centre for Healthcare Innovation ( email )

18 Jalan Tan Tock Seng
Singapore, 308443
Singapore

HOME PAGE: http://www.chi.sg

Lee Kong Chian School of Medicine ( email )

Singapore

HOME PAGE: http://https://www.ntu.edu.sg/medicine

Michael Yih-Chong Chia

Tan Tock Seng Hospital - Emergency Department ( email )

Han Nee Gan

Changi General Hospital - Accident & Emergency ( email )

Ling Tiah

Changi General Hospital - Accident & Emergency ( email )

Desmond Ren-Hao Mao

Khoo Teck Puat Hospital - Department of Acute and Emergency Care ( email )

Wei Ming Ng

National University of Singapore (NUS) - Emergency Medicine Department ( email )

Benjamin Sieu-Hon Leong

National University of Singapore (NUS) - Department of Emergency Medicine ( email )

Nausheen Edwin Doctor

Sengkang General Hospital - Department of Emergency Medicine ( email )

Marcus Eng Hock Ong

National University of Singapore (NUS) - Health Services and Systems Research ( email )

Singapore General Hospital - Department of Emergency Medicine ( email )

Singapore

Nan Liu (Contact Author)

Duke-National University of Singapore Medical School - Centre for Quantitative Medicine ( email )

8 College Rd.
Singapore, 169857
Singapore
+65 6601 6503 (Phone)