Evaluation of the Capability of ChatGPT and Other Artificial Intelligence (AI) Engines to Detect Dental Caries in Dental X-Ray Images or Radiograms
11 Pages Posted: 24 Apr 2025
Date Written: February 21, 2025
Abstract
We investigate the performance of four Vision and Language Models (VLMs) — OpenAI’s ChatGPT 4o, OpenAI’s ChatGPT 4o1, Microsoft’s Copilot, and Anthropic’s Claude 3.5 Sonnet — in detecting dental caries (cavities) from dental X-ray images. We curated a set of 250 dental X-ray images, evenly split between caries-present and caries-absent instances, and tested each VLM individually for accuracy and self-reported confidence. The results showed that Claude 3.5 Sonnet achieved the highest accuracy at 65.7%, while ChatGPT 4o1 performed at the lowest accuracy (45.7%). We also analyzed patterns of self-reported confidence levels and examined correlations in performance across the tested AI engines. Our findings underscore both the progress and the challenges that contemporary VLMs face in dental caries detection, suggesting that although certain VLMs approach potential utility levels for preliminary screening of dental caries, none can yet offer reliable standalone diagnostic results.
Keywords: Artificial Intelligence (AI) in Dentistry, AI in Oral Radiology, AI in Dental Diagnosis, Large Language Model (LLM), Vision and Language Model (VLM), Large Multi-Modalities Model (LMMM)
Suggested Citation: Suggested Citation