Suppr超能文献

ChatGPT-4o语言模型在解决眼科专业考试中的表现。

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam.

作者信息

Sławińska Barbara, Jasiński Dawid, Jaworski Aleksander, Jasińska Natalia, Jaworski Wojciech, Sysło Oliwia, Rubik Nikola, Jastrzebska Izabela, Haraziński Konrad, Goliat Weronika, Gmur Maksym, Gajewski Michal, Błecha Zuzanna, Maryniak Nicole, Latkowska Ada

机构信息

Department of Medicine, Specialist Medical Center in Polanica-Zdrój Named After St. John Paul II, Polanica-Zdrój, POL.

Department of Paediatric Cardiology, Saint John Paul II Upper Silesian Child Health Centre, Public Clinical Hospital no.6 of the Medical University of Silesia in Katowice, Katowice, POL.

出版信息

Cureus. 2025 Jul 28;17(7):e88908. doi: 10.7759/cureus.88908. eCollection 2025 Jul.

Abstract

Background Artificial intelligence (AI), particularly language models such as ChatGPT, is gaining importance in medical education and knowledge assessment. Previous studies have demonstrated the growing effectiveness of AI in solving medical exams, including the Final Medical Examination (LEK) and Polish State Specialization Exam (PES) in various specialties, raising questions about its usefulness as a tool to support specialist training processes. Objective The aim of this study was to assess the effectiveness of the latest ChatGPT-4o model in solving the PES in ophthalmology. The analysis focused on the accuracy of the answers and the model's declared confidence level to evaluate its potential educational usefulness. Methods The study was based on the official PES ophthalmology exam (Spring 2024), consisting of 120 multiple-choice questions. The ChatGPT-4o model was familiarized with the exam regulations and questions, which were input in Polish. The effectiveness of the answers was assessed based on the Medical Education Center (CEM) answer key, as well as the model's declared confidence level (on a scale of 1 to 5). The questions were divided into clinical and theoretical categories. Data were analyzed statistically using the chi-square test and the Mann-Whitney U test. Results The model provided 94 correct answers (78.3%), exceeding the passing threshold. No significant difference in effectiveness was observed between clinical and non-clinical questions (p = 0.709). The analysis of the confidence level revealed that correct answers were significantly more often provided with higher confidence (p < 0.001), suggesting that the model's self-assessment could be an indicator of answer accuracy. Conclusions ChatGPT-4o demonstrated high effectiveness in the PES ophthalmology exam, confirming the potential of AI in specialist education. The confidence level of answers could serve as a useful tool in assessing the reliability of responses. Despite promising results, expert supervision and further research in various medical fields are necessary before wider implementation of AI models in medical education.

摘要

背景 人工智能(AI),尤其是诸如ChatGPT之类的语言模型,在医学教育和知识评估中日益重要。先前的研究表明,人工智能在解决医学考试方面的有效性不断提高,包括各类专业的医学期末考试(LEK)和波兰国家专科考试(PES),这引发了关于其作为支持专科培训过程工具的实用性的问题。目的 本研究旨在评估最新的ChatGPT-4o模型在解决眼科PES考试中的有效性。分析集中在答案的准确性和模型宣称的置信水平上,以评估其潜在的教育实用性。方法 该研究基于官方的眼科PES考试(2024年春季),由120道多项选择题组成。ChatGPT-4o模型熟悉考试规则和问题,问题以波兰语输入。根据医学教育中心(CEM)的答案密钥以及模型宣称的置信水平(1至5分制)评估答案的有效性。问题分为临床和理论类别。使用卡方检验和曼-惠特尼U检验对数据进行统计分析。结果 该模型提供了94个正确答案(78.3%),超过了及格阈值。临床问题和非临床问题在有效性上未观察到显著差异(p = 0.709)。对置信水平的分析表明,正确答案更常以更高的置信度给出(p < 0.001),这表明模型的自我评估可能是答案准确性的一个指标。结论 ChatGPT-4o在眼科PES考试中表现出高效性,证实了人工智能在专科教育中的潜力。答案的置信水平可作为评估回答可靠性的有用工具。尽管取得了令人鼓舞的结果,但在医学教育中更广泛地应用人工智能模型之前,仍需要专家监督和在各个医学领域进行进一步研究。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验