基于临床病例报告和图像识别，ChatGPT 3.5、4.0、4o和Gemini在诊断口腔潜在恶性病变方面的准确性。

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition.

作者信息

Pradhan P

机构信息

15, Trauma Centre, District Hospital Neemuch Madhya Pradesh - 458441, India

出版信息

Med Oral Patol Oral Cir Bucal. 2025 Mar 1;30(2):e224-e231. doi: 10.4317/medoral.26824.

DOI:10.4317/medoral.26824

PMID:39864088

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11972639/

Abstract

BACKGROUND

The accurate and timely diagnosis of oral potentially malignant lesions (OPMLs) is crucial for effective management and prevention of oral cancer. Recent advancements in artificial intelligence technologies indicates its potential to assist in clinical decision-making. Hence, this study was carried out with the aim to evaluate and compare the diagnostic accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in identifying OPMLs.

MATERIAL AND METHODS

The analysis was carried out using 42 case reports from PubMed, Scopus and Google Scholar and images from two datasets, corresponding to different OPMLs. The reports were inputted separately for text description-based diagnosis in GPT 3.5, 4.0, 4o and Gemini, and for image recognition-based diagnosis in GPT 4o and Gemini. Two subject-matter experts independently reviewed the reports and offered their evaluations.

RESULTS

For text-based diagnosis, among LLMs, GPT 4o got the maximum number of correct responses (27/42), followed by GPT 4.0 (20/42), GPT 3.5 (18/42) and Gemini (15/42). In identifying OPMLs based on image, GPT 4o demonstrated better performance than Gemini. There was fair to moderate agreement found between Large Language Models (LLMs) and subject experts. None of the LLMs matched the accuracy of the subject experts in identifying the correct number of lesions.

CONCLUSIONS

The results point towards cautious optimism with respect to commonly used LLMs in diagnosing OPMLs. While their potential in diagnostic applications is undeniable, their integration should be approached judiciously.

摘要

背景

准确及时地诊断口腔潜在恶性病变（OPMLs）对于口腔癌的有效管理和预防至关重要。人工智能技术的最新进展表明其有协助临床决策的潜力。因此，本研究旨在评估和比较ChatGPT 3.5、4.0、4o和Gemini在识别OPMLs方面的诊断准确性。

材料与方法

分析使用了来自PubMed、Scopus和谷歌学术的42例病例报告以及来自两个数据集的对应不同OPMLs的图像。这些报告分别输入到GPT 3.5、4.0、4o和Gemini中进行基于文本描述的诊断，以及输入到GPT 4o和Gemini中进行基于图像识别的诊断。两位主题专家独立审查报告并给出评估。

结果

对于基于文本的诊断，在大型语言模型（LLMs）中，GPT 4o得到的正确回答数量最多（27/42），其次是GPT 4.0（20/42）、GPT 3.5（18/42）和Gemini（15/42）。在基于图像识别OPMLs方面，GPT 4o的表现优于Gemini。大型语言模型（LLMs）与主题专家之间的一致性为中等。在识别病变正确数量方面，没有一个大型语言模型（LLMs）能与主题专家的准确性相匹配。

结论

结果表明对于常用的大型语言模型（LLMs）在诊断OPMLs方面应持谨慎乐观态度。虽然它们在诊断应用中的潜力不可否认，但应谨慎对待其整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63a8/11972639/ec97a9e9c46d/medoral-30-e224-g001.jpg

相似文献

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition.基于临床病例报告和图像识别，ChatGPT 3.5、4.0、4o和Gemini在诊断口腔潜在恶性病变方面的准确性。

Med Oral Patol Oral Cir Bucal. 2025 Mar 1;30(2):e224-e231. doi: 10.4317/medoral.26824.

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification.GPT-4o 和 Gemini 1.5 Pro 在革兰氏染色和细菌形态识别方面的能力。

Future Microbiol. 2024;19(15):1283-1292. doi: 10.1080/17460913.2024.2381967. Epub 2024 Jul 29.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

Assessing the performance of ChatGPT and Bard/Gemini against radiologists for Prostate Imaging-Reporting and Data System classification based on prostate multiparametric MRI text reports.基于前列腺多参数MRI文本报告，评估ChatGPT和Bard/Gemini在前列腺影像报告和数据系统分类方面相对于放射科医生的性能。

Br J Radiol. 2025 Mar 1;98(1167):368-374. doi: 10.1093/bjr/tqae236.

Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。

Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.

The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性：与心脏病专家和急诊医学专家的比较。

Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology.评估Microsoft Copilot、GPT-4和Google Gemini在眼科领域的性能。

Can J Ophthalmol. 2025 Feb 4. doi: 10.1016/j.jcjo.2025.01.001.

引用本文的文献

A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估：ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较

BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

ChatGpt's accuracy in the diagnosis of oral lesions.ChatGPT在口腔病变诊断中的准确性。

BMC Oral Health. 2025 Jul 21;25(1):1229. doi: 10.1186/s12903-025-06582-2.

DeepSeek: Another step forward in the diagnosis of oral lesions.深度搜索：口腔病变诊断的又一进步。

J Dent Sci. 2025 Jul;20(3):1904-1907. doi: 10.1016/j.jds.2025.02.023. Epub 2025 Mar 9.

本文引用的文献

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Utilizing Artificial Intelligence Application for Diagnosis of Oral Lesions and Assisting Young Oral Histopathologist in Deriving Diagnosis from Provided Features - A Pilot study.利用人工智能应用诊断口腔病变并协助年轻口腔组织病理学家根据提供的特征进行诊断——一项试点研究。

J Pharm Bioallied Sci. 2024 Apr;16(Suppl 2):S1136-S1139. doi: 10.4103/jpbs.jpbs_1287_23. Epub 2024 Apr 16.

The Comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for Diagnosing Cases of Neuro-Ophthalmology.ChatGPT 3.5、微软必应和谷歌Gemini在诊断神经眼科病例方面的比较

Cureus. 2024 Apr 14;16(4):e58232. doi: 10.7759/cureus.58232. eCollection 2024 Apr.

How good is ChatGPT at answering patients' questions related to early detection of oral (mouth) cancer?ChatGPT 在回答与口腔癌早期检测相关的患者问题方面有多好？

Oral Surg Oral Med Oral Pathol Oral Radiol. 2024 Aug;138(2):269-278. doi: 10.1016/j.oooo.2024.04.010. Epub 2024 Apr 19.

Past, Present, and Future Diagnostic Methods for the Early Noninvasive Detection of Oral Premalignant Lesions: A State of the Art and Systematic Review.口腔癌前病变早期非侵入性检测的过去、现在和未来诊断方法：最新技术与系统评价

Ear Nose Throat J. 2024 May 2:1455613241245204. doi: 10.1177/01455613241245204.

What are the applications of ChatGPT in healthcare: Gain or loss?ChatGPT在医疗保健领域的应用有哪些：是收益还是损失？

Health Sci Rep. 2024 Feb 14;7(2):e1878. doi: 10.1002/hsr2.1878. eCollection 2024 Feb.

ChatGPT in healthcare: A taxonomy and systematic review.ChatGPT 在医疗保健中的应用：分类法与系统综述。

Comput Methods Programs Biomed. 2024 Mar;245:108013. doi: 10.1016/j.cmpb.2024.108013. Epub 2024 Jan 15.

Exploring the opportunities and challenges of implementing artificial intelligence in healthcare: A systematic literature review.探索在医疗保健领域实施人工智能的机遇与挑战：一项系统的文献综述。

Urol Oncol. 2024 Mar;42(3):48-56. doi: 10.1016/j.urolonc.2023.11.019. Epub 2023 Dec 14.

A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges.ChatGPT综合调查：进展、应用、前景与挑战

Meta Radiol. 2023 Sep;1(2). doi: 10.1016/j.metrad.2023.100022. Epub 2023 Oct 7.

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.基于复杂病例临床案例生成的ChatGPT鉴别诊断列表：诊断准确性评估。

JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于临床病例报告和图像识别，ChatGPT 3.5、4.0、4o和Gemini在诊断口腔潜在恶性病变方面的准确性。

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition.

作者信息

机构信息

出版信息

BACKGROUND

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献