ChatGPT-4在医师执照考试（OKAP）中的表现提升：与ChatGPT-3.5的对比研究

Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.

作者信息

Teebagy Sean, Colwell Lauren, Wood Emma, Yaghy Antonio, Faustina Misha

机构信息

Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts.

出版信息

J Acad Ophthalmol (2017). 2023 Sep 11;15(2):e184-e187. doi: 10.1055/s-0043-1774399. eCollection 2023 Jul.

DOI:10.1055/s-0043-1774399

PMID:37701862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10495224/

Abstract

This study aims to evaluate the performance of ChatGPT-4, an advanced artificial intelligence (AI) language model, on the Ophthalmology Knowledge Assessment Program (OKAP) examination compared to its predecessor, ChatGPT-3.5. Both models were tested on 180 OKAP practice questions covering various ophthalmology subject categories. ChatGPT-4 significantly outperformed ChatGPT-3.5 (81% vs. 57%; <0.001), indicating improvements in medical knowledge assessment. The superior performance of ChatGPT-4 suggests potential applicability in ophthalmologic education and clinical decision support systems. Future research should focus on refining AI models, ensuring a balanced representation of fundamental and specialized knowledge, and determining the optimal method of integrating AI into medical education and practice.

摘要

本研究旨在评估先进的人工智能（AI）语言模型ChatGPT-4在眼科知识评估计划（OKAP）考试中的表现，并与它的前身ChatGPT-3.5进行比较。两个模型都在涵盖各种眼科主题类别的180道OKAP练习题上进行了测试。ChatGPT-4的表现显著优于ChatGPT-3.5（81%对57%；<0.001），表明在医学知识评估方面有所改进。ChatGPT-4的卓越表现表明其在眼科教育和临床决策支持系统中具有潜在的适用性。未来的研究应专注于改进人工智能模型，确保基础知识和专业知识的平衡呈现，并确定将人工智能整合到医学教育和实践中的最佳方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1684/10495224/82cb9859803f/10-1055-s-0043-1774399-i425-1.jpg

相似文献

Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.

J Acad Ophthalmol (2017). 2023 Sep 11;15(2):e184-e187. doi: 10.1055/s-0043-1774399. eCollection 2023 Jul.

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment.

J Fr Ophtalmol. 2023 Sep;46(7):706-711. doi: 10.1016/j.jfo.2023.05.006. Epub 2023 Aug 1.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

Development and Evaluation of Aeyeconsult: A Novel Ophthalmology Chatbot Leveraging Verified Textbook Knowledge and GPT-4.

J Surg Educ. 2024 Mar;81(3):438-443. doi: 10.1016/j.jsurg.2023.11.019. Epub 2023 Dec 21.

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.

JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.

A multicenter analysis of the ophthalmic knowledge assessment program and American Board of Ophthalmology written qualifying examination performance.

Ophthalmology. 2012 Oct;119(10):1949-53. doi: 10.1016/j.ophtha.2012.06.010. Epub 2012 Jul 28.

Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.

Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.

ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education.

Cureus. 2023 Sep 11;15(9):e45043. doi: 10.7759/cureus.45043. eCollection 2023 Sep.

引用本文的文献

ChatGPT-4o and OpenAI-o1: A Comparative Analysis of Its Accuracy in Refractive Surgery.

J Clin Med. 2025 Jul 22;14(15):5175. doi: 10.3390/jcm14155175.

Evaluating ChatGPT-4 Plus in Ophthalmology: Effect of Image Recognition and Domain-Specific Pretraining on Diagnostic Performance.

Diagnostics (Basel). 2025 Jul 19;15(14):1820. doi: 10.3390/diagnostics15141820.

EYE-Llama, an in-domain large language model for ophthalmology.

iScience. 2025 Jun 23;28(7):112984. doi: 10.1016/j.isci.2025.112984. eCollection 2025 Jul 18.

Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making.

J Ophthalmic Vis Res. 2025 May 5;20. doi: 10.18502/jovr.v20.14692. eCollection 2025.

Evaluation and comparison of large language models' responses to questions related optic neuritis.

Front Med (Lausanne). 2025 Jun 25;12:1516442. doi: 10.3389/fmed.2025.1516442. eCollection 2025.

Evaluating the accuracy of advanced language learning models in ophthalmology: A comparative study of ChatGPT-4o and Meta AI's Llama 3.1.

Adv Ophthalmol Pract Res. 2025 Jan 6;5(2):95-99. doi: 10.1016/j.aopr.2025.01.002. eCollection 2025 May-Jun.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.

Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.

ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions.

BMC Oral Health. 2025 Feb 1;25(1):173. doi: 10.1186/s12903-025-05554-w.

Evaluating the Performance of ChatGPT 3.5 and 4.0 on StatPearls Oculoplastic Surgery Text- and Image-Based Exam Questions.

Cureus. 2024 Nov 16;16(11):e73812. doi: 10.7759/cureus.73812. eCollection 2024 Nov.

本文引用的文献

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study.

BMJ. 2022 Dec 21;379:e072826. doi: 10.1136/bmj-2022-072826.

The Pursuit of Generalizability and Equity Through Artificial Intelligence-Based Risk Prediction Models.

JAMA Ophthalmol. 2022 Aug 1;140(8):798-799. doi: 10.1001/jamaophthalmol.2022.2139.

Do no harm: a roadmap for responsible machine learning for health care.

Nat Med. 2019 Sep;25(9):1337-1340. doi: 10.1038/s41591-019-0548-6. Epub 2019 Aug 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT-4在医师执照考试（OKAP）中的表现提升：与ChatGPT-3.5的对比研究

Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献