评估 ChatGPT 在 MKSAP 心脏病学董事会复习题中的表现。

Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions.

机构信息

Florida State University College of Medicine Internal Medicine Residency Program at Lee Health, Cape Coral, Florida, USA.

Icahn School of Medicine at Mount Sinai, New York City, New York, USA.

出版信息

Int J Cardiol. 2024 Dec 15;417:132576. doi: 10.1016/j.ijcard.2024.132576. Epub 2024 Sep 19.

DOI:10.1016/j.ijcard.2024.132576

PMID:39306288

Abstract

Chat Generative Pretrained Transformer (ChatGPT) is a natural language processing tool created by OpenAI. Much of the discussion regarding artificial intelligence (AI) in medicine is the ability of the language to enhance medical practice, improve efficiency and decrease errors. The objective of this study was to analyze the ability of ChatGPT to answer board-style cardiovascular medicine questions by using the Medical Knowledge Self-Assessment Program (MKSAP).The study evaluated the performance of ChatGPT (versions 3.5 and 4), alongside internal medicine residents and internal medicine and cardiology attendings, in answering 98 multiple-choice questions (MCQs) from the Cardiovascular Medicine Chapter of MKSAP. ChatGPT-4 demonstrated an accuracy of 74.5 %, comparable to internal medicine (IM) intern (63.3 %), senior resident (63.3 %), internal medicine attending physician (62.2 %), and ChatGPT-3.5 (64.3 %) but significantly lower than cardiology attending physician (85.7 %). Subcategory analysis revealed no statistical difference between ChatGPT and physicians, except in valvular heart disease where cardiology attending outperformed ChatGPT (p = 0.031) for version 3.5, and for heart failure (p = 0.046) where ChatGPT-4 outperformed senior resident. While ChatGPT shows promise in certain subcategories, in order to establish AI as a reliable educational tool for medical professionals, performance of ChatGPT will likely need to surpass the accuracy of instructors, ideally achieving the near-perfect score on posed questions.

摘要

ChatGPT 是由 OpenAI 开发的一种自然语言处理工具。在医学领域，关于人工智能（AI）的讨论主要集中在语言增强医疗实践、提高效率和减少错误的能力上。本研究的目的是分析 ChatGPT 通过使用医学知识自我评估计划（MKSAP）回答心血管医学问题的能力。

该研究评估了 ChatGPT（版本 3.5 和 4）与内科住院医师以及内科和心脏病学主治医生一起回答 MKSAP 心血管医学章节中 98 个多项选择题（MCQ）的表现。ChatGPT-4 的准确率为 74.5%，与内科住院医师（63.3%）、高级住院医师（63.3%）、内科主治医生（62.2%）和 ChatGPT-3.5（64.3%）相当，但明显低于心脏病学主治医生（85.7%）。子类别分析显示，ChatGPT 与医生之间没有统计学差异，除了在瓣膜性心脏病方面，心脏病学主治医生的表现优于 ChatGPT（p=0.031，适用于版本 3.5），以及心力衰竭方面，ChatGPT-4 的表现优于高级住院医师（p=0.046）。

虽然 ChatGPT 在某些子类别中表现出了一定的潜力，但要使 AI 成为医学专业人员可靠的教育工具，其性能可能需要超过教师的准确性，理想情况下，在提出的问题上达到近乎完美的分数。

相似文献

Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions.评估 ChatGPT 在 MKSAP 心脏病学董事会复习题中的表现。

Int J Cardiol. 2024 Dec 15;417:132576. doi: 10.1016/j.ijcard.2024.132576. Epub 2024 Sep 19.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现：观察性研究。

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

Is ChatGPT 3.5 smarter than Otolaryngology trainees? A comparison study of board style exam questions.ChatGPT 3.5 比耳鼻喉科住院医师更聪明吗？基于 board 风格考试试题的比较研究

PLoS One. 2024 Sep 26;19(9):e0306233. doi: 10.1371/journal.pone.0306233. eCollection 2024.

Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4.ChatGPT 在解决骨科 Board 风格问题方面的表现：ChatGPT 3.5 和 ChatGPT 4 的对比分析

Clin Orthop Surg. 2024 Aug;16(4):669-673. doi: 10.4055/cios23179. Epub 2024 Mar 7.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Performance of Language Models on the Family Medicine In-Training Exam.语言模型在家庭医学住院医师考试中的表现。

Fam Med. 2024 Oct;56(9):555-560. doi: 10.22454/FamMed.2024.233738. Epub 2024 Aug 12.

Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions.ChatGPT 在皮肤病学多选题专业证书考试中的表现。

Clin Exp Dermatol. 2024 Jun 25;49(7):722-727. doi: 10.1093/ced/llad197.

Progression of an Artificial Intelligence Chatbot (ChatGPT) for Pediatric Cardiology Educational Knowledge Assessment.人工智能聊天机器人（ChatGPT）在儿科心脏病学教育知识评估中的应用进展。

Pediatr Cardiol. 2024 Feb;45(2):309-313. doi: 10.1007/s00246-023-03385-6. Epub 2024 Jan 3.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

引用本文的文献

The Role of Natural Language Processing in Graduate Medical Education: A Scoping Review.自然语言处理在毕业后医学教育中的作用：一项范围综述

Cureus. 2025 Mar 24;17(3):e81078. doi: 10.7759/cureus.81078. eCollection 2025 Mar.

Accuracy of ChatGPT in answering cardiology board-style questions.ChatGPT回答心脏病学委员会风格问题的准确性。

J Educ Eval Health Prof. 2025;22:9. doi: 10.3352/jeehp.2025.22.9. Epub 2025 Feb 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估 ChatGPT 在 MKSAP 心脏病学董事会复习题中的表现。

Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献