ChatGPT-4o语言模型在解决眼科专业考试中的表现。

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam.

作者信息

Sławińska Barbara, Jasiński Dawid, Jaworski Aleksander, Jasińska Natalia, Jaworski Wojciech, Sysło Oliwia, Rubik Nikola, Jastrzebska Izabela, Haraziński Konrad, Goliat Weronika, Gmur Maksym, Gajewski Michal, Błecha Zuzanna, Maryniak Nicole, Latkowska Ada

机构信息

Department of Medicine, Specialist Medical Center in Polanica-Zdrój Named After St. John Paul II, Polanica-Zdrój, POL.

Department of Paediatric Cardiology, Saint John Paul II Upper Silesian Child Health Centre, Public Clinical Hospital no.6 of the Medical University of Silesia in Katowice, Katowice, POL.

出版信息

Cureus. 2025 Jul 28;17(7):e88908. doi: 10.7759/cureus.88908. eCollection 2025 Jul.

DOI:10.7759/cureus.88908

PMID:40895978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12392048/

Abstract

Background Artificial intelligence (AI), particularly language models such as ChatGPT, is gaining importance in medical education and knowledge assessment. Previous studies have demonstrated the growing effectiveness of AI in solving medical exams, including the Final Medical Examination (LEK) and Polish State Specialization Exam (PES) in various specialties, raising questions about its usefulness as a tool to support specialist training processes. Objective The aim of this study was to assess the effectiveness of the latest ChatGPT-4o model in solving the PES in ophthalmology. The analysis focused on the accuracy of the answers and the model's declared confidence level to evaluate its potential educational usefulness. Methods The study was based on the official PES ophthalmology exam (Spring 2024), consisting of 120 multiple-choice questions. The ChatGPT-4o model was familiarized with the exam regulations and questions, which were input in Polish. The effectiveness of the answers was assessed based on the Medical Education Center (CEM) answer key, as well as the model's declared confidence level (on a scale of 1 to 5). The questions were divided into clinical and theoretical categories. Data were analyzed statistically using the chi-square test and the Mann-Whitney U test. Results The model provided 94 correct answers (78.3%), exceeding the passing threshold. No significant difference in effectiveness was observed between clinical and non-clinical questions (p = 0.709). The analysis of the confidence level revealed that correct answers were significantly more often provided with higher confidence (p < 0.001), suggesting that the model's self-assessment could be an indicator of answer accuracy. Conclusions ChatGPT-4o demonstrated high effectiveness in the PES ophthalmology exam, confirming the potential of AI in specialist education. The confidence level of answers could serve as a useful tool in assessing the reliability of responses. Despite promising results, expert supervision and further research in various medical fields are necessary before wider implementation of AI models in medical education.

摘要

背景人工智能（AI），尤其是诸如ChatGPT之类的语言模型，在医学教育和知识评估中日益重要。先前的研究表明，人工智能在解决医学考试方面的有效性不断提高，包括各类专业的医学期末考试（LEK）和波兰国家专科考试（PES），这引发了关于其作为支持专科培训过程工具的实用性的问题。目的本研究旨在评估最新的ChatGPT-4o模型在解决眼科PES考试中的有效性。分析集中在答案的准确性和模型宣称的置信水平上，以评估其潜在的教育实用性。方法该研究基于官方的眼科PES考试（2024年春季），由120道多项选择题组成。ChatGPT-4o模型熟悉考试规则和问题，问题以波兰语输入。根据医学教育中心（CEM）的答案密钥以及模型宣称的置信水平（1至5分制）评估答案的有效性。问题分为临床和理论类别。使用卡方检验和曼-惠特尼U检验对数据进行统计分析。结果该模型提供了94个正确答案（78.3%），超过了及格阈值。临床问题和非临床问题在有效性上未观察到显著差异（p = 0.709）。对置信水平的分析表明，正确答案更常以更高的置信度给出（p < 0.001），这表明模型的自我评估可能是答案准确性的一个指标。结论 ChatGPT-4o在眼科PES考试中表现出高效性，证实了人工智能在专科教育中的潜力。答案的置信水平可作为评估回答可靠性的有用工具。尽管取得了令人鼓舞的结果，但在医学教育中更广泛地应用人工智能模型之前，仍需要专家监督和在各个医学领域进行进一步研究。

相似文献

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam.ChatGPT-4o语言模型在解决眼科专业考试中的表现。

Cureus. 2025 Jul 28;17(7):e88908. doi: 10.7759/cureus.88908. eCollection 2025 Jul.

Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.ChatGPT-4 Omni和Gemini 1.5 Pro在土耳其医学专业考试中与眼科相关问题上的表现。

Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Evaluating the Accuracy and Performance of ChatGPT-4o in Solving Japanese National Dental Technician Examination.评估ChatGPT-4o在解决日本国家牙科技师考试问题中的准确性和性能。

Int Dent J. 2025 Jun 9;75(4):100847. doi: 10.1016/j.identj.2025.100847.

Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.评估ChatGPT在系统性红斑狼疮生物治疗中的效用：ChatGPT与谷歌网络搜索的比较研究

JMIR Form Res. 2025 Aug 28;9:e76458. doi: 10.2196/76458.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

The performance of ChatGPT on medical image-based assessments and implications for medical education.ChatGPT在基于医学图像的评估中的表现及其对医学教育的影响。

BMC Med Educ. 2025 Aug 23;25(1):1192. doi: 10.1186/s12909-025-07752-0.

Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较：人工智能在医学教育中的应用启示

Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

本文引用的文献

Performance of GPT-4o and DeepSeek-R1 in the Polish Infectious Diseases Specialty Exam.GPT-4o和DeepSeek-R1在波兰传染病专业考试中的表现。

Cureus. 2025 Apr 23;17(4):e82870. doi: 10.7759/cureus.82870. eCollection 2025 Apr.

GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination.GPT-4o与人类考生：波兰牙科最终考试中的表现分析

Cureus. 2024 Sep 6;16(9):e68813. doi: 10.7759/cureus.68813. eCollection 2024 Sep.

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.人工智能与医学专业人员在波兰医学期末考试中的表现比较

Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.

Can artificial intelligence predict COVID-19 mortality?人工智能能否预测 COVID-19 死亡率？

Eur Rev Med Pharmacol Sci. 2023 Oct;27(20):9866-9871. doi: 10.26355/eurrev_202310_34163.

Reshaping medical education: Performance of ChatGPT on a PES medical examination.重塑医学教育：ChatGPT 在 PES 医学考试中的表现。

Cardiol J. 2024;31(3):442-450. doi: 10.5603/cj.97517. Epub 2023 Oct 13.

Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations.ChatGPT能通过波兰放射学与诊断成像专业考试吗？对其优势与局限的洞察。

Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. eCollection 2023.

Artificial intelligence in medicine.医学中的人工智能。

Metabolism. 2017 Apr;69S:S36-S40. doi: 10.1016/j.metabol.2017.01.011. Epub 2017 Jan 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT-4o语言模型在解决眼科专业考试中的表现。

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献