Suppr超能文献

评估眼科领域的ChatGPT-4 Plus:图像识别和特定领域预训练对诊断性能的影响。

Evaluating ChatGPT-4 Plus in Ophthalmology: Effect of Image Recognition and Domain-Specific Pretraining on Diagnostic Performance.

作者信息

Wu Kevin Y, Qian Shu Yu, Marchand Michael

机构信息

Department of Surgery, Division of Ophthalmology, University of Sherbrooke, Sherbrooke, QC J1G 2E8, Canada.

Faculty of Medicine, University of Sherbrooke, Sherbrooke, QC J1G 2E8, Canada.

出版信息

Diagnostics (Basel). 2025 Jul 19;15(14):1820. doi: 10.3390/diagnostics15141820.

Abstract

: In recent years, the rapid advancements in artificial intelligence models, such as ChatGPT (version of 29 April 2024), have prompted interest from numerous domains of medicine, such as ophthalmology. As such, research is necessary to further assess its potential while simultaneously evaluating its shortcomings. Our study thus evaluates ChatGPT-4's performance on the American Academy of Ophthalmology's (AAO) Basic and Clinical Science Course (BCSC) Self-Assessment Program, focusing on its image recognition capabilities and its enhancement with domain-specific pretraining. : The chatbot was tested on 1300 BCSC Self-Assessment Program questions, including text and image-based questions. Domain-specific pretraining was tested for performance improvements. The primary outcome was the model's accuracy when presented with text and image-based multiple choice questions. Logistic regression and post hoc analyzes examined performance variations by question difficulty, image presence, and subspecialties. : The chatbot achieved an average accuracy of 78% compared with the average test-taker score of 74%. The repeatability kappa was 0.85 (95% CI: 0.82-0.87). Following domain-specific pretraining, the model's overall accuracy increased to 85%. The accuracy of the model's responses first depends on question difficulty (LR = 366), followed by image presence (LR = 108) and exam section (LR = 79). : The chatbot appeared to be similar or superior to human trainee test takers in ophthalmology, even with image recognition questions. Domain-specific training appeared to have improved accuracy. While these results do not necessarily imply that the chatbot has the comprehensive skill level of a human ophthalmologist, the results suggest there may be educational value to these tools if additional investigations provide similar results.

摘要

近年来,诸如ChatGPT(2024年4月29日版本)等人工智能模型的快速发展引发了医学众多领域(如眼科)的关注。因此,有必要进行研究以进一步评估其潜力,同时评估其缺点。我们的研究旨在评估ChatGPT-4在美国眼科学会(AAO)基础与临床科学课程(BCSC)自我评估项目中的表现,重点关注其图像识别能力以及通过特定领域预训练的提升效果。

该聊天机器人在1300道BCSC自我评估项目问题上进行了测试,包括基于文本和图像的问题。对特定领域预训练进行了性能改进测试。主要结果是该模型在面对基于文本和图像的多项选择题时的准确率。逻辑回归和事后分析研究了按问题难度、图像存在情况和亚专业划分的性能差异。

与平均考生得分74%相比,该聊天机器人的平均准确率为78%。重复性kappa值为0.85(95%置信区间:0.82 - 0.87)。经过特定领域预训练后,该模型的整体准确率提高到了85%。该模型回答的准确率首先取决于问题难度(似然比 = 366),其次是图像存在情况(似然比 = 108)和考试部分(似然比 = 79)。

即使在图像识别问题上,该聊天机器人在眼科方面似乎与人类实习考生相似或更胜一筹。特定领域训练似乎提高了准确率。虽然这些结果不一定意味着该聊天机器人具有人类眼科医生的综合技能水平,但如果进一步调查得出类似结果,这些工具可能具有教育价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249f/12293433/93f18c0b309b/diagnostics-15-01820-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验