ChatGPT-4o和DeepSeek-3对复杂口腔病变的鉴别诊断性能：多模态成像与病例难度分析

Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.

作者信息

Hassanein Fatma E A, El Barbary Ahmed, Hussein Radwa R, Ahmed Yousra, El-Guindy Jylan, Sarhan Susan, Abou-Bakr Asmaa

机构信息

Oral Medicine, Periodontology, and Oral Diagnosis, Faculty of Dentistry, King Salman International University, El Tur, Egypt.

Oral Medicine and Periodontology, Faculty of Dentistry, Cairo University, Giza, Egypt.

出版信息

Oral Dis. 2025 Jul 1. doi: 10.1111/odi.70007.

DOI:10.1111/odi.70007

PMID:40589366

Abstract

BACKGROUND

AI models like ChatGPT-4o and DeepSeek-3 show diagnostic promise, but their reliability in complex, image-based oral lesions remains unclear. This study aimed to evaluate and compare the diagnostic accuracy of ChatGPT-4o and DeepSeek-3 despite their differing modalities against oral medicine (OM) experts across varied lesion types and case difficulty levels.

METHODS

Eighty standardized clinical vignettes derived from real-world oral disease cases, including clinical images/radiographs, were evaluated. Differential diagnoses were generated by ChatGPT-4o, DeepSeek-3, and four board-certified OM specialists, with accuracy assessed at Top-1, Top-3, and Top-5 levels.

RESULTS

OM specialists consistently achieved the highest diagnostic accuracy. However, DeepSeek-3 significantly outperformed ChatGPT-4o at the Top-3 level (p = 0.0153) and showed greater robustness in high-difficulty and inflammatory cases despite its text-only modality. Multimodal imaging enhanced diagnostic accuracy. Regression analysis indicated lesion type and imaging modality as positive predictors, while diagnostic difficulty negatively impacted Top-1 performance.

CONCLUSIONS

Remarkably, the text-only DeepSeek-3 model exceeded the diagnostic performance of the multimodal ChatGPT-4o model for complex oral lesions, highlighting its structured reasoning capabilities and reduced hallucination rate. These findings underscore the potential of non-vision LLMs in diagnostic support, emphasizing the critical need for expert oversight in complex scenarios.

摘要

背景

像ChatGPT-4o和DeepSeek-3这样的人工智能模型显示出诊断潜力，但它们在基于图像的复杂口腔病变中的可靠性仍不明确。本研究旨在评估和比较ChatGPT-4o和DeepSeek-3在不同病变类型和病例难度水平下，与口腔医学（OM）专家相比的诊断准确性，尽管它们的模式不同。

方法

对80个源自真实世界口腔疾病病例的标准化临床病例进行评估，包括临床图像/放射照片。由ChatGPT-4o、DeepSeek-3和四位获得委员会认证的OM专家生成鉴别诊断，并在Top-1、Top-3和Top-5水平评估准确性。

结果

OM专家始终获得最高的诊断准确性。然而，DeepSeek-3在Top-3水平上显著优于ChatGPT-4o（p = 0.0153），并且尽管其仅为文本模式，但在高难度和炎症性病例中表现出更强的稳健性。多模态成像提高了诊断准确性。回归分析表明病变类型和成像模式是积极预测因素，而诊断难度对Top-1表现有负面影响。

结论

值得注意的是，仅为文本模式的DeepSeek-3模型在复杂口腔病变的诊断性能上超过了多模态的ChatGPT-4o模型，突出了其结构化推理能力和较低的幻觉率。这些发现强调了非视觉语言模型在诊断支持中的潜力，强调了在复杂场景中专家监督的迫切需求。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT-4o和DeepSeek-3对复杂口腔病变的鉴别诊断性能：多模态成像与病例难度分析

Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

ChatGPT-4o和DeepSeek-3对复杂口腔病变的鉴别诊断性能：多模态成像与病例难度分析

Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献