评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性：使用巴西眼科理事会委员会考试问题进行的性能分析

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

作者信息

Casagrande Diego, Gobira Mauro

机构信息

Ophthalmology, Vision institute (IPEPO), São Paulo, BRA.

出版信息

Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.

DOI:10.7759/cureus.79565

PMID:40144426

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11939833/

Abstract

INTRODUCTION

Large language models (LLMs) like Gemini 2.0 Advanced and ChatGPT-4o are increasingly applied in medical contexts. This study assesses their accuracy in answering cataract-related questions from Brazilian ophthalmology board exams, evaluating their potential for clinical decision support.

METHODS

A retrospective analysis was conducted using 221 multiple-choice questions. Responses from both LLMs were evaluated by two independent ophthalmologists against the official answer key. Accuracy rates and inter-evaluator agreement (Cohen's kappa) were analyzed.

RESULTS

Gemini 2.0 Advanced achieved 85.45% and 80.91% accuracy, while ChatGPT-4o scored 80.00% and 84.09%. Inter-evaluator agreement was moderate (κ = 0.514 and 0.431, respectively). Performance varied across exam years.

CONCLUSION

Both models demonstrated high accuracy in cataract-related board exam questions, supporting their potential as educational tools. However, moderate agreement and performance variability indicate the need for further refinement and validation.

摘要

引言

像Gemini 2.0 Advanced和ChatGPT-4o这样的大语言模型越来越多地应用于医学领域。本研究评估了它们在回答巴西眼科委员会考试中与白内障相关问题时的准确性，评估了它们在临床决策支持方面的潜力。

方法

使用221道多项选择题进行回顾性分析。两位独立的眼科医生根据官方答案对两个大语言模型的回答进行评估。分析准确率和评估者间一致性（科恩kappa系数）。

结果

Gemini 2.0 Advanced的准确率分别为85.45%和80.91%，而ChatGPT-4o的得分分别为80.00%和84.09%。评估者间一致性为中等（κ分别为0.514和0.431）。不同考试年份的表现有所不同。

结论

两个模型在与白内障相关的委员会考试问题上都表现出了较高的准确性，支持了它们作为教育工具的潜力。然而，中等的一致性和表现的变异性表明需要进一步完善和验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edbe/11939833/08cd498fc097/cureus-0017-00000079565-i01.jpg

相似文献

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性：使用巴西眼科理事会委员会考试问题进行的性能分析

Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Gemini Advanced与ChatGPT 4.0在眼科住院医师眼科知识评估计划（OKAP）考试复习题库中的表现比较。

Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.ChatGPT-4o、谷歌Gemini 1.5 Pro、微软Copilot Pro与眼科医生在葡萄膜炎和眼部炎症管理中的比较：大型语言模型的对比研究

J Fr Ophtalmol. 2025 Apr;48(4):104468. doi: 10.1016/j.jfo.2025.104468. Epub 2025 Mar 13.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.用土耳其医学肿瘤学会年度委员会考试问题对大型语言模型聊天机器人的肿瘤学知识进行基准测试。

BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.

本文引用的文献

Performance of Chatgpt in ophthalmology exam; human versus AI.Chatgpt 在眼科考试中的表现；人类与 AI 相比。

Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.

Exploring Artificial Intelligence Programs' Understanding of Lens, Cataract, and Refractive Surgery Information.探索人工智能程序对晶状体、白内障和屈光手术信息的理解。

Middle East Afr J Ophthalmol. 2024 Sep 13;30(3):173-176. doi: 10.4103/meajo.meajo_199_23. eCollection 2023 Jul-Sep.

Accuracy of large language models in answering ophthalmology board-style questions: A meta-analysis.大语言模型回答眼科考试式问题的准确性：一项荟萃分析。

Asia Pac J Ophthalmol (Phila). 2024 Sep-Oct;13(5):100106. doi: 10.1016/j.apjo.2024.100106. Epub 2024 Oct 5.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

[ChatGPT and the German board examination for ophthalmology: an evaluation].[ChatGPT与德国眼科医师资格考试：一项评估]

Ophthalmologie. 2024 Jul;121(7):554-564. doi: 10.1007/s00347-024-02046-0. Epub 2024 May 27.

Performance of Google's Artificial Intelligence Chatbot "Bard" (Now "Gemini") on Ophthalmology Board Exam Practice Questions.谷歌人工智能聊天机器人“巴德”（现称“双子座”）在眼科委员会考试练习题上的表现。

Cureus. 2024 Mar 31;16(3):e57348. doi: 10.7759/cureus.57348. eCollection 2024 Mar.

Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment.谷歌 Gemini 和巴德人工智能聊天机器人在眼科知识评估中的表现。

Eye (Lond). 2024 Sep;38(13):2530-2535. doi: 10.1038/s41433-024-03067-4. Epub 2024 Apr 13.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society.ChatGPT在日本眼科学会专科医生资格考试中的表现。

Cureus. 2023 Dec 4;15(12):e49903. doi: 10.7759/cureus.49903. eCollection 2023 Dec.

Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.ChatGPT 和 Bard 在 FRCOphth 官方第一部分实践问题上的表现。

Br J Ophthalmol. 2024 Sep 20;108(10):1379-1383. doi: 10.1136/bjo-2023-324091.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性：使用巴西眼科理事会委员会考试问题进行的性能分析

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

引言

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献