Suppr超能文献

基于临床病例报告和图像识别,ChatGPT 3.5、4.0、4o和Gemini在诊断口腔潜在恶性病变方面的准确性。

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition.

作者信息

Pradhan P

机构信息

15, Trauma Centre, District Hospital Neemuch Madhya Pradesh - 458441, India

出版信息

Med Oral Patol Oral Cir Bucal. 2025 Mar 1;30(2):e224-e231. doi: 10.4317/medoral.26824.

Abstract

BACKGROUND

The accurate and timely diagnosis of oral potentially malignant lesions (OPMLs) is crucial for effective management and prevention of oral cancer. Recent advancements in artificial intelligence technologies indicates its potential to assist in clinical decision-making. Hence, this study was carried out with the aim to evaluate and compare the diagnostic accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in identifying OPMLs.

MATERIAL AND METHODS

The analysis was carried out using 42 case reports from PubMed, Scopus and Google Scholar and images from two datasets, corresponding to different OPMLs. The reports were inputted separately for text description-based diagnosis in GPT 3.5, 4.0, 4o and Gemini, and for image recognition-based diagnosis in GPT 4o and Gemini. Two subject-matter experts independently reviewed the reports and offered their evaluations.

RESULTS

For text-based diagnosis, among LLMs, GPT 4o got the maximum number of correct responses (27/42), followed by GPT 4.0 (20/42), GPT 3.5 (18/42) and Gemini (15/42). In identifying OPMLs based on image, GPT 4o demonstrated better performance than Gemini. There was fair to moderate agreement found between Large Language Models (LLMs) and subject experts. None of the LLMs matched the accuracy of the subject experts in identifying the correct number of lesions.

CONCLUSIONS

The results point towards cautious optimism with respect to commonly used LLMs in diagnosing OPMLs. While their potential in diagnostic applications is undeniable, their integration should be approached judiciously.

摘要

背景

准确及时地诊断口腔潜在恶性病变(OPMLs)对于口腔癌的有效管理和预防至关重要。人工智能技术的最新进展表明其有协助临床决策的潜力。因此,本研究旨在评估和比较ChatGPT 3.5、4.0、4o和Gemini在识别OPMLs方面的诊断准确性。

材料与方法

分析使用了来自PubMed、Scopus和谷歌学术的42例病例报告以及来自两个数据集的对应不同OPMLs的图像。这些报告分别输入到GPT 3.5、4.0、4o和Gemini中进行基于文本描述的诊断,以及输入到GPT 4o和Gemini中进行基于图像识别的诊断。两位主题专家独立审查报告并给出评估。

结果

对于基于文本的诊断,在大型语言模型(LLMs)中,GPT 4o得到的正确回答数量最多(27/42),其次是GPT 4.0(20/42)、GPT 3.5(18/42)和Gemini(15/42)。在基于图像识别OPMLs方面,GPT 4o的表现优于Gemini。大型语言模型(LLMs)与主题专家之间的一致性为中等。在识别病变正确数量方面,没有一个大型语言模型(LLMs)能与主题专家的准确性相匹配。

结论

结果表明对于常用的大型语言模型(LLMs)在诊断OPMLs方面应持谨慎乐观态度。虽然它们在诊断应用中的潜力不可否认,但应谨慎对待其整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63a8/11972639/ec97a9e9c46d/medoral-30-e224-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验