Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

作者信息

Bahir Daniel, Zur Omri, Attal Leah, Nujeidat Zaki, Knaanie Ariela, Pikkel Joseph, Mimouni Michael, Plopsky Gilad

机构信息

Department of Ophthalmology, Tzafon Medical Center, Poriya, Israel.

Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel.

出版信息

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

DOI:10.1007/s00417-024-06625-4

PMID:39277830

Abstract

INTRODUCTION

The rapid advancement of artificial intelligence (AI), particularly in large language models like ChatGPT and Google's Gemini AI, marks a transformative era in technological innovation. This study explores the potential of AI in ophthalmology, focusing on the capabilities of ChatGPT and Gemini AI. While these models hold promise for medical education and clinical support, their integration requires comprehensive evaluation. This research aims to bridge a gap in the literature by comparing Gemini AI and ChatGPT, assessing their performance against ophthalmology residents using a dataset derived from ophthalmology board exams.

METHODS

A dataset comprising 600 questions across 12 subspecialties was curated from Israeli ophthalmology residency exams, encompassing text and image-based formats. Four AI models - ChatGPT-3.5, ChatGPT-4, Gemini, and Gemini Advanced - underwent testing on this dataset. The study includes a comparative analysis with Israeli ophthalmology residents, employing specific metrics for performance assessment.

RESULTS

Gemini Advanced demonstrated superior performance with a 66% accuracy rate. Notably, ChatGPT-4 exhibited improvement at 62%, Gemini at 58%, and ChatGPT-3.5 served as the reference at 46%. Comparative analysis with residents offered insights into AI models' performance relative to human-level medical knowledge. Further analysis delved into yearly performance trends, topic-specific variations, and the impact of images on chatbot accuracy.

CONCLUSION

The study unveils nuanced AI model capabilities in ophthalmology, emphasizing domain-specific variations. The superior performance of Gemini Advanced superior performance indicates significant advancements, while ChatGPT-4's improvement is noteworthy. Both Gemini and ChatGPT-3.5 demonstrated commendable performance. The comparative analysis underscores AI's evolving role as a supplementary tool in medical education. This research contributes vital insights into AI effectiveness in ophthalmology, highlighting areas for refinement. As AI models evolve, targeted improvements can enhance adaptability across subspecialties, making them valuable tools for medical professionals and enriching patient care.

KEY MESSAGES

What is known AI breakthroughs, like ChatGPT and Google's Gemini AI, are reshaping healthcare. In ophthalmology, AI integration has overhauled clinical workflows, particularly in analyzing images for diseases like diabetic retinopathy and glaucoma. What is new This study presents a pioneering comparison between Gemini AI and ChatGPT, evaluating their performance against ophthalmology residents using a meticulously curated dataset derived from real-world ophthalmology board exams. Notably, Gemini Advanced demonstrates superior performance, showcasing substantial advancements, while the evolution of ChatGPT-4 also merits attention. Both models exhibit commendable capabilities. These findings offer crucial insights into the efficacy of AI in ophthalmology, shedding light on areas ripe for further enhancement and optimization.

摘要

引言

人工智能（AI）的迅速发展，尤其是在ChatGPT和谷歌的Gemini AI等大型语言模型方面，标志着技术创新的变革时代。本研究探讨了AI在眼科领域的潜力，重点关注ChatGPT和Gemini AI的能力。虽然这些模型在医学教育和临床支持方面具有前景，但它们的整合需要全面评估。本研究旨在通过比较Gemini AI和ChatGPT，利用来自眼科委员会考试的数据集评估它们相对于眼科住院医师的表现，弥合文献中的差距。

方法

从以色列眼科住院医师考试中整理出一个包含600个问题的数据集，涵盖12个亚专业，包括基于文本和图像的格式。四个AI模型——ChatGPT-3.5、ChatGPT-4、Gemini和Gemini Advanced——在这个数据集上进行了测试。该研究包括与以色列眼科住院医师的比较分析，采用特定指标进行性能评估。

结果

Gemini Advanced表现卓越，准确率达66%。值得注意的是，ChatGPT-4的准确率为62%，有所提高，Gemini为58%，ChatGPT-3.5作为参考，准确率为46%。与住院医师的比较分析提供了关于AI模型相对于人类医学知识表现的见解。进一步分析深入研究了年度性能趋势、特定主题的差异以及图像对聊天机器人准确性的影响。

结论

该研究揭示了AI模型在眼科领域的细微差别，强调了特定领域的差异。Gemini Advanced的卓越表现表明了重大进展，而ChatGPT-4的进步也值得关注。Gemini和ChatGPT-3.5都表现出了值得称赞的性能。比较分析强调了AI作为医学教育辅助工具的不断演变的作用。这项研究为AI在眼科领域的有效性提供了重要见解，突出了有待改进的领域。随着AI模型的发展，有针对性的改进可以提高各亚专业的适应性，使其成为医疗专业人员的宝贵工具，并丰富患者护理。

关键信息

已知的情况AI突破，如ChatGPT和谷歌的Gemini AI，正在重塑医疗保健。在眼科领域，AI的整合彻底改变了临床工作流程，特别是在分析糖尿病视网膜病变和青光眼等疾病的图像方面。新的情况本研究对Gemini AI和ChatGPT进行了开创性的比较，使用精心整理的来自实际眼科委员会考试的数据集评估它们相对于眼科住院医师的表现。值得注意的是，Gemini Advanced表现卓越，展示了重大进展，而ChatGPT-4的演变也值得关注。两个模型都表现出了值得称赞的能力。这些发现为AI在眼科领域的功效提供了关键见解，揭示了有待进一步加强和优化的领域。

相似文献

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

Acta Orthop Traumatol Turc. 2025 Mar 17;59(1):18-26. doi: 10.5152/j.aott.2025.24090.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

BMC Med Educ. 2025 Feb 10;25(1):214. doi: 10.1186/s12909-024-06389-9.

Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.

J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.

Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis.

J Clin Neurosci. 2025 Apr;134:111097. doi: 10.1016/j.jocn.2025.111097. Epub 2025 Feb 11.

Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.

Br J Ophthalmol. 2024 Sep 20;108(10):1379-1383. doi: 10.1136/bjo-2023-324091.

Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam.

Dig Liver Dis. 2024 Aug;56(8):1304-1311. doi: 10.1016/j.dld.2024.02.019. Epub 2024 Mar 19.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

引用本文的文献

DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning.

Adv Ophthalmol Pract Res. 2025 May 9;5(3):189-195. doi: 10.1016/j.aopr.2025.05.001. eCollection 2025 Aug-Sep.

Evaluating the Performance of ChatGPT on Board-Style Examination Questions in Ophthalmology: A Meta-Analysis.

J Med Syst. 2025 Jul 5;49(1):94. doi: 10.1007/s10916-025-02227-7.

Large language models in the management of chronic ocular diseases: a scoping review.

Front Cell Dev Biol. 2025 Jun 18;13:1608988. doi: 10.3389/fcell.2025.1608988. eCollection 2025.

Enhancing ophthalmology students' awareness of retinitis pigmentosa: assessing the efficacy of ChatGPT in AI-assisted teaching of rare diseases-a quasi-experimental study.

Front Med (Lausanne). 2025 Mar 18;12:1534294. doi: 10.3389/fmed.2025.1534294. eCollection 2025.

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.

Effectiveness of various general large language models in clinical consensus and case analysis in dental implantology: a comparative study.

BMC Med Inform Decis Mak. 2025 Mar 26;25(1):147. doi: 10.1186/s12911-025-02972-2.

Emergency Medicine Assistants in the Field of Toxicology, Comparison of ChatGPT-3.5 and GEMINI Artificial Intelligence Systems.

Acta Med Litu. 2024;31(2):294-301. doi: 10.15388/Amed.2024.31.2.18. Epub 2024 Dec 4.

Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology.

Pediatr Nephrol. 2025 Mar 5. doi: 10.1007/s00467-025-06723-3.

Artificial intelligence and glaucoma: a lucid and comprehensive review.

Front Med (Lausanne). 2024 Dec 16;11:1423813. doi: 10.3389/fmed.2024.1423813. eCollection 2024.

本文引用的文献

OCT analysis of preoperative foveal microstructure in recent-onset macula-off rhegmatogenous retinal detachment: visual acuity prognostic factors.

Br J Ophthalmol. 2024 Nov 22;108(12):1743-1748. doi: 10.1136/bjo-2024-325278.

Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review.

JMIR Med Educ. 2023 Oct 20;9:e48785. doi: 10.2196/48785.

Emergence of artificial intelligence chatbots in scientific research.

J Exerc Rehabil. 2023 Jun 28;19(3):139-140. doi: 10.12965/jer.2346234.117. eCollection 2023 Jun.

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

GPT-4: a new era of artificial intelligence in medicine.

Ir J Med Sci. 2023 Dec;192(6):3197-3200. doi: 10.1007/s11845-023-03377-8. Epub 2023 Apr 19.

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.

N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.

Artificial Intelligence and Human Trust in Healthcare: Focus on Clinicians.

J Med Internet Res. 2020 Jun 19;22(6):e15154. doi: 10.2196/15154.

The impact of artificial intelligence in medicine on the future role of the physician.

PeerJ. 2019 Oct 4;7:e7702. doi: 10.7717/peerj.7702. eCollection 2019.

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.

Nat Biomed Eng. 2018 Mar;2(3):158-164. doi: 10.1038/s41551-018-0195-0. Epub 2018 Feb 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

KEY MESSAGES

引言

方法

结果

结论

关键信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献