Suppr超能文献

ChatGPT在韩国麻醉学与疼痛医学住院医师培训考试中的表现:观察性研究

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study.

作者信息

Yoon Soo-Hyuk, Oh Seok Kyeong, Lim Byung Gun, Lee Ho-Jin

机构信息

Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea.

Department of Anesthesiology and Pain Medicine, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.

出版信息

JMIR Med Educ. 2024 Sep 16;10:e56859. doi: 10.2196/56859.

Abstract

BACKGROUND

ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams, showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored.

OBJECTIVE

This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and pain medicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potential applications in medical education.

METHODS

We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVA X in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Korean anesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images, diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT across different languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the original Korean texts and their English translations.

RESULTS

A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance than GPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overall performance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a language processing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001).

CONCLUSIONS

This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizes the need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest that although AI advancements are promising, they require careful evaluation and development to ensure acceptable performance across diverse linguistic and professional settings.

摘要

背景

ChatGPT已在医疗保健领域接受测试,包括美国医师执照考试和专业考试,成绩接近及格。已使用英语委员会考试题目评估了其在麻醉学领域的表现;然而,其在韩国的有效性仍未得到探索。

目的

本研究调查了ChatGPT在韩语环境下麻醉学和疼痛医学领域的问题解决表现,突出了人工智能(AI)的进展,并探讨了其在医学教育中的潜在应用。

方法

我们使用过去5年对韩国麻醉学住院医师进行的在职培训考试,调查了GPT-4、GPT-3.5和CLOVA X在麻醉学和疼痛医学领域的表现(正确答案数量/问题数量),每年的题目构成有100道题。分析中排除了包含图像、图表或照片的问题。此外,为了评估GPT在不同语言中的表现差异,我们使用韩语原文及其英文翻译对GPT-4的问题解决能力进行了比较分析。

结果

共分析了398道题。GPT-4(67.8%)的总体表现明显优于GPT-3.5(37.2%)和CLOVA-X(36.7%)。然而,GPT-3.5和CLOVA X的总体表现没有显著差异。此外,GPT-4在翻译成英语的题目上表现更优,表明存在语言处理差异(英语:75.4% 对韩语:67.8%;差异7.5%;95% CI 3.1%-11.9%;P = 0.001)。

结论

本研究强调了ChatGPT等人工智能工具在医学教育和实践中的潜力,但强调需要谨慎应用并进一步完善,尤其是在非英语医学环境中。研究结果表明,尽管人工智能的进展很有前景,但需要仔细评估和开发,以确保在不同语言和专业环境中都有可接受的表现。

相似文献

本文引用的文献

5
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验