Suppr超能文献

人工智能与医学专业人员在波兰医学期末考试中的表现比较

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.

作者信息

Jaworski Aleksander, Jasiński Dawid, Jaworski Wojciech, Hop Aleksandra, Janek Artur, Sławińska Barbara, Konieczniak Lena, Rzepka Maciej, Jung Maximilian, Sysło Oliwia, Jarząbek Victoria, Błecha Zuzanna, Haraziński Konrad, Jasińska Natalia

机构信息

Department of Medicine, Specialist Medical Centre Joint Stock Company, Polanica-Zdrój, POL.

Department of Medicine, Prof. K. Gibiński University Clinical Center of the Medical University of Silesia in Katowice, Katowice, POL.

出版信息

Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.

Abstract

BACKGROUND

The rapid development of artificial intelligence (AI) technologies like OpenAI's Generative Pretrained Transformer (GPT), particularly ChatGPT, has shown promising applications in various fields, including medicine. This study evaluates ChatGPT's performance on the Polish Final Medical Examination (LEK), comparing its efficacy to that of human test-takers.

METHODS

The study analyzed ChatGPT's ability to answer 196 multiple-choice questions from the spring 2021 LEK. Questions were categorized into "clinical cases" and "other" general medical knowledge, and then divided according to medical fields. Two versions of ChatGPT (3.5 and 4.0) were tested. Statistical analyses, including Pearson's χ test, and Mann-Whitney U test, were conducted to compare the AI's performance and confidence levels.

RESULTS

ChatGPT 3.5 correctly answered 50.51% of the questions, while ChatGPT 4.0 answered 77.55% correctly, surpassing the 56% passing threshold. Version 3.5 showed significantly higher confidence in correct answers, whereas version 4.0 maintained consistent confidence regardless of answer accuracy. No significant differences in performance were observed across different medical fields.

CONCLUSIONS

ChatGPT 4.0 demonstrated the ability to pass the LEK, indicating substantial potential for AI in medical education and assessment. Future improvements in AI models, such as the anticipated ChatGPT 5.0, may enhance further performance, potentially equaling or surpassing human test-takers.

摘要

背景

像OpenAI的生成式预训练变换器(GPT),尤其是ChatGPT这样的人工智能(AI)技术的迅速发展,已在包括医学在内的各个领域展现出了有前景的应用。本研究评估了ChatGPT在波兰医学期末考试(LEK)中的表现,并将其功效与人类考生的功效进行比较。

方法

该研究分析了ChatGPT回答2021年春季LEK中196道多项选择题的能力。问题被分为“临床病例”和“其他”一般医学知识,然后再根据医学领域进行划分。对ChatGPT的两个版本(3.5和4.0)进行了测试。进行了包括Pearson卡方检验和Mann-Whitney U检验在内的统计分析,以比较人工智能的表现和置信水平。

结果

ChatGPT 3.5正确回答了50.51%的问题,而ChatGPT 4.0正确回答了77.55%的问题,超过了56%的及格门槛。3.5版本在正确答案上表现出显著更高的置信度,而4.0版本无论答案准确性如何,置信度都保持一致。在不同医学领域未观察到表现上的显著差异。

结论

ChatGPT 4.0展示了通过LEK的能力,表明人工智能在医学教育和评估方面具有巨大潜力。人工智能模型未来的改进,如预期中的ChatGPT 5.0,可能会进一步提升表现,有可能达到或超过人类考生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/020d/11366403/1cff774322b8/cureus-0016-00000066011-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验