Deep Learning Under Scrutiny: Performance Against Health Care Professionals in Detecting Diseases from Medical Imaging - Systematic Review and Meta-Analysis
33 Pages Posted: 13 May 2019More...
Background: Deep learning offers considerable promise for medical diagnostics. In this review, we evaluated the diagnostic accuracy of deep learning (DL) algorithms versus health care professionals (HCPs) in classifying diseases from medical imaging.
Methods: We searched (Pre-)Medline, Embase, Science Citation Index, Conference Proceedings Citation Index, and arXiv from 01 January 2012 until 31 May 2018. Studies comparing the diagnostic performance of DL models and HCPs, for any pre-specified condition based on medical imaging material, were included. We extracted binary diagnostic accuracy data and constructed contingency tables at the reported thresholds to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample validation were included in a meta-analysis.
Results: 24 studies, from a starting number of 19889, compared DL models with HCPs. 22 studies provided enough data to construct contingency tables, enabling calculation of test accuracy. The mean sensitivity for DL models was 78% (range 13 - 100%), and mean specificity was 86% (range 51 - 100%). An out-of-sample external validation was performed by 5 studies and were therefore included in the meta-analysis. We found a pooled sensitivity of 86% (95% CI: 84 - 88%) for DL models and 93% (95% CI: 87 - 97%) for HCPs, and a pooled specificity of 88% (95% CI: 84 - 92%) for DL models and 87% (95% CI: 84 - 89%) for HCPs.
Conclusion: Our review found the diagnostic performance of deep learning models to be similar to health care professionals. A major finding was the poor reporting and potential biases arising from study design that limited reliable interpretation of the reported diagnostic accuracy. New reporting standards which address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology.
Funding Statement: The authors state: "None"
Declaration of Interests: All authors have completed the ICMJE uniform disclosure form online (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethics Approval Statement: The authors state: "Not required." The authors utilized PRISMA and MOOSE protocols.
Keywords: Artificial intelligence, machine learning, deep learning, medical imaging, diagnosis, classification, diagnostic accuracy, sensitivity, specificity, systematic review, meta-analysis
Suggested Citation: Suggested Citation