Suppr超能文献

评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性:使用巴西眼科理事会委员会考试问题进行的性能分析

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

作者信息

Casagrande Diego, Gobira Mauro

机构信息

Ophthalmology, Vision institute (IPEPO), São Paulo, BRA.

出版信息

Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.

Abstract

INTRODUCTION

Large language models (LLMs) like Gemini 2.0 Advanced and ChatGPT-4o are increasingly applied in medical contexts. This study assesses their accuracy in answering cataract-related questions from Brazilian ophthalmology board exams, evaluating their potential for clinical decision support.

METHODS

A retrospective analysis was conducted using 221 multiple-choice questions. Responses from both LLMs were evaluated by two independent ophthalmologists against the official answer key. Accuracy rates and inter-evaluator agreement (Cohen's kappa) were analyzed.

RESULTS

Gemini 2.0 Advanced achieved 85.45% and 80.91% accuracy, while ChatGPT-4o scored 80.00% and 84.09%. Inter-evaluator agreement was moderate (κ = 0.514 and 0.431, respectively). Performance varied across exam years.

CONCLUSION

Both models demonstrated high accuracy in cataract-related board exam questions, supporting their potential as educational tools. However, moderate agreement and performance variability indicate the need for further refinement and validation.

摘要

引言

像Gemini 2.0 Advanced和ChatGPT-4o这样的大语言模型越来越多地应用于医学领域。本研究评估了它们在回答巴西眼科委员会考试中与白内障相关问题时的准确性,评估了它们在临床决策支持方面的潜力。

方法

使用221道多项选择题进行回顾性分析。两位独立的眼科医生根据官方答案对两个大语言模型的回答进行评估。分析准确率和评估者间一致性(科恩kappa系数)。

结果

Gemini 2.0 Advanced的准确率分别为85.45%和80.91%,而ChatGPT-4o的得分分别为80.00%和84.09%。评估者间一致性为中等(κ分别为0.514和0.431)。不同考试年份的表现有所不同。

结论

两个模型在与白内障相关的委员会考试问题上都表现出了较高的准确性,支持了它们作为教育工具的潜力。然而,中等的一致性和表现的变异性表明需要进一步完善和验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edbe/11939833/08cd498fc097/cureus-0017-00000079565-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验