Suppr超能文献

ChatGPT在韩国麻醉学与疼痛医学住院医师培训考试中的表现:观察性研究

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study.

作者信息

Yoon Soo-Hyuk, Oh Seok Kyeong, Lim Byung Gun, Lee Ho-Jin

机构信息

Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea.

Department of Anesthesiology and Pain Medicine, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.

出版信息

JMIR Med Educ. 2024 Sep 16;10:e56859. doi: 10.2196/56859.

Abstract

BACKGROUND

ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams, showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored.

OBJECTIVE

This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and pain medicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potential applications in medical education.

METHODS

We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVA X in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Korean anesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images, diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT across different languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the original Korean texts and their English translations.

RESULTS

A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance than GPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overall performance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a language processing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001).

CONCLUSIONS

This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizes the need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest that although AI advancements are promising, they require careful evaluation and development to ensure acceptable performance across diverse linguistic and professional settings.

摘要

背景

ChatGPT已在医疗保健领域接受测试,包括美国医师执照考试和专业考试,成绩接近及格。已使用英语委员会考试题目评估了其在麻醉学领域的表现;然而,其在韩国的有效性仍未得到探索。

目的

本研究调查了ChatGPT在韩语环境下麻醉学和疼痛医学领域的问题解决表现,突出了人工智能(AI)的进展,并探讨了其在医学教育中的潜在应用。

方法

我们使用过去5年对韩国麻醉学住院医师进行的在职培训考试,调查了GPT-4、GPT-3.5和CLOVA X在麻醉学和疼痛医学领域的表现(正确答案数量/问题数量),每年的题目构成有100道题。分析中排除了包含图像、图表或照片的问题。此外,为了评估GPT在不同语言中的表现差异,我们使用韩语原文及其英文翻译对GPT-4的问题解决能力进行了比较分析。

结果

共分析了398道题。GPT-4(67.8%)的总体表现明显优于GPT-3.5(37.2%)和CLOVA-X(36.7%)。然而,GPT-3.5和CLOVA X的总体表现没有显著差异。此外,GPT-4在翻译成英语的题目上表现更优,表明存在语言处理差异(英语:75.4% 对韩语:67.8%;差异7.5%;95% CI 3.1%-11.9%;P = 0.001)。

结论

本研究强调了ChatGPT等人工智能工具在医学教育和实践中的潜力,但强调需要谨慎应用并进一步完善,尤其是在非英语医学环境中。研究结果表明,尽管人工智能的进展很有前景,但需要仔细评估和开发,以确保在不同语言和专业环境中都有可接受的表现。

相似文献

6
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
9
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

本文引用的文献

4
Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis.
BJOG. 2024 Feb;131(3):378-380. doi: 10.1111/1471-0528.17641. Epub 2023 Aug 21.
5
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.
7
Human-like problem-solving abilities in large language models using ChatGPT.
Front Artif Intell. 2023 May 24;6:1199350. doi: 10.3389/frai.2023.1199350. eCollection 2023.
8
Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future.
Br J Anaesth. 2023 Aug;131(2):e36-e37. doi: 10.1016/j.bja.2023.04.033. Epub 2023 May 26.
9
Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test.
Am J Gastroenterol. 2023 Dec 1;118(12):2280-2282. doi: 10.14309/ajg.0000000000002320. Epub 2023 May 22.
10
Performance of ChatGPT on a primary FRCA multiple choice question bank.
Br J Anaesth. 2023 Aug;131(2):e34-e35. doi: 10.1016/j.bja.2023.04.025. Epub 2023 May 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验