ChatGPT在韩国麻醉学与疼痛医学住院医师培训考试中的表现：观察性研究

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study.

作者信息

Yoon Soo-Hyuk, Oh Seok Kyeong, Lim Byung Gun, Lee Ho-Jin

机构信息

Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea.

Department of Anesthesiology and Pain Medicine, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.

出版信息

JMIR Med Educ. 2024 Sep 16;10:e56859. doi: 10.2196/56859.

DOI:10.2196/56859

PMID:39284182

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11443200/

Abstract

BACKGROUND

ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams, showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored.

OBJECTIVE

This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and pain medicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potential applications in medical education.

METHODS

We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVA X in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Korean anesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images, diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT across different languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the original Korean texts and their English translations.

RESULTS

A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance than GPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overall performance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a language processing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001).

CONCLUSIONS

This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizes the need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest that although AI advancements are promising, they require careful evaluation and development to ensure acceptable performance across diverse linguistic and professional settings.

摘要

背景

ChatGPT已在医疗保健领域接受测试，包括美国医师执照考试和专业考试，成绩接近及格。已使用英语委员会考试题目评估了其在麻醉学领域的表现；然而，其在韩国的有效性仍未得到探索。

目的

本研究调查了ChatGPT在韩语环境下麻醉学和疼痛医学领域的问题解决表现，突出了人工智能（AI）的进展，并探讨了其在医学教育中的潜在应用。

方法

我们使用过去5年对韩国麻醉学住院医师进行的在职培训考试，调查了GPT-4、GPT-3.5和CLOVA X在麻醉学和疼痛医学领域的表现（正确答案数量/问题数量），每年的题目构成有100道题。分析中排除了包含图像、图表或照片的问题。此外，为了评估GPT在不同语言中的表现差异，我们使用韩语原文及其英文翻译对GPT-4的问题解决能力进行了比较分析。

结果

共分析了398道题。GPT-4（67.8%）的总体表现明显优于GPT-3.5（37.2%）和CLOVA-X（36.7%）。然而，GPT-3.5和CLOVA X的总体表现没有显著差异。此外，GPT-4在翻译成英语的题目上表现更优，表明存在语言处理差异（英语：75.4% 对韩语：67.8%；差异7.5%；95% CI 3.1%-11.9%；P = 0.001）。

结论

本研究强调了ChatGPT等人工智能工具在医学教育和实践中的潜力，但强调需要谨慎应用并进一步完善，尤其是在非英语医学环境中。研究结果表明，尽管人工智能的进展很有前景，但需要仔细评估和开发，以确保在不同语言和专业环境中都有可接受的表现。

相似文献

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study.ChatGPT在韩国麻醉学与疼痛医学住院医师培训考试中的表现：观察性研究

JMIR Med Educ. 2024 Sep 16;10:e56859. doi: 10.2196/56859.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现：观察性研究。

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较：观察性研究。

JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions.韩国国家口腔卫生士考试中韩语和英语问题的大语言模型回答准确率的比较分析

Int J Dent Hyg. 2025 May;23(2):267-276. doi: 10.1111/idh.12848. Epub 2024 Oct 16.

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.ChatGPT-4与日本内科住院医师在普通内科培训考试中的表现比较：比较研究

JMIR Med Educ. 2023 Dec 6;9:e52202. doi: 10.2196/52202.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study.GPT-3.5和GPT-4在韩国药剂师执照考试中的表现：比较研究。

JMIR Med Educ. 2024 Dec 4;10:e57451. doi: 10.2196/57451.

本文引用的文献

Evaluating the Performance of ChatGPT in Dermatology Specialty Certificate Examination-style Questions: A Comparative Analysis between English and Korean Language Settings.评估ChatGPT在皮肤科专科证书考试题型问题中的表现：英语和韩语环境下的对比分析。

Indian J Dermatol. 2024 Jul-Aug;69(4):338-341. doi: 10.4103/ijd.ijd_1050_23. Epub 2024 Aug 19.

Clinical Knowledge and Reasoning Abilities of AI Large Language Models in Anesthesiology: A Comparative Study on the American Board of Anesthesiology Examination.人工智能大语言模型在麻醉学领域的临床知识与推理能力：关于美国麻醉学委员会考试的比较研究

Anesth Analg. 2024 Aug 1;139(2):349-356. doi: 10.1213/ANE.0000000000006892. Epub 2024 Apr 19.

Could ChatGPT-4 pass an anaesthesiology board examination? Follow-up assessment of a comprehensive set of board examination practice questions.ChatGPT-4能通过麻醉学委员会考试吗？对一套全面的委员会考试练习题的后续评估。

Br J Anaesth. 2024 Jan;132(1):172-174. doi: 10.1016/j.bja.2023.10.025. Epub 2023 Nov 22.

Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis.ChatGPT在医学考试中的表现：系统评价与荟萃分析。

BJOG. 2024 Feb;131(3):378-380. doi: 10.1111/1471-0528.17641. Epub 2023 Aug 21.

Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。

Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现：比较研究。

JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.

Human-like problem-solving abilities in large language models using ChatGPT.使用ChatGPT的大语言模型中的类人问题解决能力。

Front Artif Intell. 2023 May 24;6:1199350. doi: 10.3389/frai.2023.1199350. eCollection 2023.

Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future.人工智能与麻醉学考试：探索ChatGPT作为未来的前奏

Br J Anaesth. 2023 Aug;131(2):e36-e37. doi: 10.1016/j.bja.2023.04.033. Epub 2023 May 26.

Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test.ChatGPT 答错多项选择题美国胃肠病学院自测题

Am J Gastroenterol. 2023 Dec 1;118(12):2280-2282. doi: 10.14309/ajg.0000000000002320. Epub 2023 May 22.

Performance of ChatGPT on a primary FRCA multiple choice question bank.ChatGPT在初级皇家麻醉师学院多项选择题库中的表现。

Br J Anaesth. 2023 Aug;131(2):e34-e35. doi: 10.1016/j.bja.2023.04.025. Epub 2023 May 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验