
Preprints with The Lancet is a collaboration between The Lancet Group of journals and SSRN to facilitate the open sharing of preprints for early engagement, community comment, and collaboration. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early-stage research papers that have not been peer-reviewed. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. The findings should not be used for clinical or public health decision-making or presented without highlighting these facts. For more information, please see the FAQs.
Generalizability and Diagnostic Efficacy of AI Models for Thyroid Ultrasound
36 Pages Posted: 19 Apr 2022
More...Abstract
Background: Using artificial intelligence (AI) models has improved ultrasound assessment of thyroid nodules. However, its low generalizability during implementation is due to the uniformity of training images from limited centers.
Methods: A real-world nationwide dataset of 10023 patients with pathologically confirmed thyroid nodules collected from 208 medical institutes of different levels in all 31 administrative regions across mainland China covering 12 different ultrasound equipment vendors from November 2017 to January 2019 was used to construct ultrasound AI models. The generalizability of the models for segmentation and classification depending on hospitals, vendors, or regions was evaluated by calculating the dice coefficient and area under the receiver-operating characteristic curve (AUC), respectively. Using ultrasound images from 1020 patients in the test dataset, three scenarios were compared with three senior and three junior radiologists to optimize incorporating AI technology into clinical practice: diagnosis without AI assistance, free-style AI assistance, and rule-based AI assistance.
Findings: Segmentation and classification tasks were performed using 10,320 manually annotated images from 5478 patients, and 24,944 images from 10,023 patients, respectively. In the segmentation and the classification models based on hospitals, vendors or regions, the highest dice value (0.9007) was found for the segmentation model trained and tested on nationwide data, while the highest AUC value (0.8526) occurred for the classification model trained on mixed vendor data on a general test dataset covering all vendors. For the classification task, the AI model outperformed all six radiologists (p < 0.05 for all). In rule-based AI-assistance mode, all radiologists achieved significant improvement in diagnostic capabilities (P < 0.05 for all).
Interpretation: Data diversity facilitates generalization of thyroid ultrasound AI models. The developed highly generalized model, by combining with radiologists in an appropriate way, can improve malignancy diagnosis.
Funding Information: This study was supported by the National Natural Science Foundation of China.
Declaration of Interests: None.
Ethics Approval Statement: This study was approved by the institutional review board (IRB) of Ruijin Hospital, and was undertaken according to the Declaration of Helsinki. Informed consent from patients was waived by the IRB because of the retrospective nature of this study.
Keywords: Artificial intelligence, generalizability, thyroid, deep learning, ultrasound
Suggested Citation: Suggested Citation