人工智能与医学专业人员在波兰医学期末考试中的表现比较

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.

作者信息

Jaworski Aleksander, Jasiński Dawid, Jaworski Wojciech, Hop Aleksandra, Janek Artur, Sławińska Barbara, Konieczniak Lena, Rzepka Maciej, Jung Maximilian, Sysło Oliwia, Jarząbek Victoria, Błecha Zuzanna, Haraziński Konrad, Jasińska Natalia

机构信息

Department of Medicine, Specialist Medical Centre Joint Stock Company, Polanica-Zdrój, POL.

Department of Medicine, Prof. K. Gibiński University Clinical Center of the Medical University of Silesia in Katowice, Katowice, POL.

出版信息

Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.

DOI:10.7759/cureus.66011

PMID:39221376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11366403/

Abstract

BACKGROUND

The rapid development of artificial intelligence (AI) technologies like OpenAI's Generative Pretrained Transformer (GPT), particularly ChatGPT, has shown promising applications in various fields, including medicine. This study evaluates ChatGPT's performance on the Polish Final Medical Examination (LEK), comparing its efficacy to that of human test-takers.

METHODS

The study analyzed ChatGPT's ability to answer 196 multiple-choice questions from the spring 2021 LEK. Questions were categorized into "clinical cases" and "other" general medical knowledge, and then divided according to medical fields. Two versions of ChatGPT (3.5 and 4.0) were tested. Statistical analyses, including Pearson's χ test, and Mann-Whitney U test, were conducted to compare the AI's performance and confidence levels.

RESULTS

ChatGPT 3.5 correctly answered 50.51% of the questions, while ChatGPT 4.0 answered 77.55% correctly, surpassing the 56% passing threshold. Version 3.5 showed significantly higher confidence in correct answers, whereas version 4.0 maintained consistent confidence regardless of answer accuracy. No significant differences in performance were observed across different medical fields.

CONCLUSIONS

ChatGPT 4.0 demonstrated the ability to pass the LEK, indicating substantial potential for AI in medical education and assessment. Future improvements in AI models, such as the anticipated ChatGPT 5.0, may enhance further performance, potentially equaling or surpassing human test-takers.

摘要

背景

像OpenAI的生成式预训练变换器（GPT），尤其是ChatGPT这样的人工智能（AI）技术的迅速发展，已在包括医学在内的各个领域展现出了有前景的应用。本研究评估了ChatGPT在波兰医学期末考试（LEK）中的表现，并将其功效与人类考生的功效进行比较。

方法

该研究分析了ChatGPT回答2021年春季LEK中196道多项选择题的能力。问题被分为“临床病例”和“其他”一般医学知识，然后再根据医学领域进行划分。对ChatGPT的两个版本（3.5和4.0）进行了测试。进行了包括Pearson卡方检验和Mann-Whitney U检验在内的统计分析，以比较人工智能的表现和置信水平。

结果

ChatGPT 3.5正确回答了50.51%的问题，而ChatGPT 4.0正确回答了77.55%的问题，超过了56%的及格门槛。3.5版本在正确答案上表现出显著更高的置信度，而4.0版本无论答案准确性如何，置信度都保持一致。在不同医学领域未观察到表现上的显著差异。

结论

ChatGPT 4.0展示了通过LEK的能力，表明人工智能在医学教育和评估方面具有巨大潜力。人工智能模型未来的改进，如预期中的ChatGPT 5.0，可能会进一步提升表现，有可能达到或超过人类考生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/020d/11366403/1cff774322b8/cureus-0016-00000066011-i01.jpg

相似文献

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.人工智能与医学专业人员在波兰医学期末考试中的表现比较

Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.

GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination.GPT-4o与人类考生：波兰牙科最终考试中的表现分析

Cureus. 2024 Sep 6;16(9):e68813. doi: 10.7759/cureus.68813. eCollection 2024 Sep.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment.ChatGPT 人工智能语言模型成功通过欧洲眼科委员会法语考试：医学知识评估的新方法。

J Fr Ophtalmol. 2023 Sep;46(7):706-711. doi: 10.1016/j.jfo.2023.05.006. Epub 2023 Aug 1.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine.评估 ChatGPT 在日本内科学中的有效性和倾向。

J Eval Clin Pract. 2024 Sep;30(6):1017-1023. doi: 10.1111/jep.14011. Epub 2024 May 19.

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.人工智能 ChatGPT 在肿瘤学检查问题中的准确性。

J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

引用本文的文献

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam.ChatGPT-4o语言模型在解决眼科专业考试中的表现。

Cureus. 2025 Jul 28;17(7):e88908. doi: 10.7759/cureus.88908. eCollection 2025 Jul.

Attitudes and usage of ChatGPT among pharmacy students in a Sub-Saharan African country, Zambia: findings and implications on the education system.撒哈拉以南非洲国家赞比亚药学专业学生对ChatGPT的态度及使用情况：研究结果及其对教育系统的启示

BMC Med Educ. 2025 Sep 1;25(1):1237. doi: 10.1186/s12909-025-07833-0.

Performance of GPT-4o and DeepSeek-R1 in the Polish Infectious Diseases Specialty Exam.GPT-4o和DeepSeek-R1在波兰传染病专业考试中的表现。

Cureus. 2025 Apr 23;17(4):e82870. doi: 10.7759/cureus.82870. eCollection 2025 Apr.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性：系统评价与网络荟萃分析

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination.GPT-4o与人类考生：波兰牙科最终考试中的表现分析

Cureus. 2024 Sep 6;16(9):e68813. doi: 10.7759/cureus.68813. eCollection 2024 Sep.

本文引用的文献

Can artificial intelligence predict COVID-19 mortality?人工智能能否预测 COVID-19 死亡率？

Eur Rev Med Pharmacol Sci. 2023 Oct;27(20):9866-9871. doi: 10.26355/eurrev_202310_34163.

Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations.ChatGPT能通过波兰放射学与诊断成像专业考试吗？对其优势与局限的洞察。

Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. eCollection 2023.

ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用：对其前景与合理担忧的系统评价

Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.

The rise of ChatGPT: Exploring its potential in medical education.ChatGPT 的兴起：探索其在医学教育中的潜力。

Anat Sci Educ. 2024 Jul-Aug;17(5):926-931. doi: 10.1002/ase.2270. Epub 2023 Mar 28.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Artificial Intelligence in Medicine: Where Are We Now?人工智能在医学中的应用：我们现在处于什么阶段？

Acad Radiol. 2020 Jan;27(1):62-70. doi: 10.1016/j.acra.2019.10.001. Epub 2019 Oct 19.

Artificial Intelligence in Medicine: Weighing the Accomplishments, Hype, and Promise.医学中的人工智能：权衡成就、炒作与前景。

Yearb Med Inform. 2019 Aug;28(1):257-262. doi: 10.1055/s-0039-1677891. Epub 2019 Apr 25.

Medical ethics considerations on artificial intelligence.人工智能的医学伦理思考。

J Clin Neurosci. 2019 Jun;64:277-282. doi: 10.1016/j.jocn.2019.03.001. Epub 2019 Mar 14.

Artificial intelligence in medicine.医学中的人工智能。

Metabolism. 2017 Apr;69S:S36-S40. doi: 10.1016/j.metabol.2017.01.011. Epub 2017 Jan 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能与医学专业人员在波兰医学期末考试中的表现比较

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献