Florida State University College of Medicine Internal Medicine Residency Program at Lee Health, Cape Coral, Florida, USA.
Icahn School of Medicine at Mount Sinai, New York City, New York, USA.
Int J Cardiol. 2024 Dec 15;417:132576. doi: 10.1016/j.ijcard.2024.132576. Epub 2024 Sep 19.
Chat Generative Pretrained Transformer (ChatGPT) is a natural language processing tool created by OpenAI. Much of the discussion regarding artificial intelligence (AI) in medicine is the ability of the language to enhance medical practice, improve efficiency and decrease errors. The objective of this study was to analyze the ability of ChatGPT to answer board-style cardiovascular medicine questions by using the Medical Knowledge Self-Assessment Program (MKSAP).The study evaluated the performance of ChatGPT (versions 3.5 and 4), alongside internal medicine residents and internal medicine and cardiology attendings, in answering 98 multiple-choice questions (MCQs) from the Cardiovascular Medicine Chapter of MKSAP. ChatGPT-4 demonstrated an accuracy of 74.5 %, comparable to internal medicine (IM) intern (63.3 %), senior resident (63.3 %), internal medicine attending physician (62.2 %), and ChatGPT-3.5 (64.3 %) but significantly lower than cardiology attending physician (85.7 %). Subcategory analysis revealed no statistical difference between ChatGPT and physicians, except in valvular heart disease where cardiology attending outperformed ChatGPT (p = 0.031) for version 3.5, and for heart failure (p = 0.046) where ChatGPT-4 outperformed senior resident. While ChatGPT shows promise in certain subcategories, in order to establish AI as a reliable educational tool for medical professionals, performance of ChatGPT will likely need to surpass the accuracy of instructors, ideally achieving the near-perfect score on posed questions.
ChatGPT 是由 OpenAI 开发的一种自然语言处理工具。在医学领域,关于人工智能(AI)的讨论主要集中在语言增强医疗实践、提高效率和减少错误的能力上。本研究的目的是分析 ChatGPT 通过使用医学知识自我评估计划(MKSAP)回答心血管医学问题的能力。
该研究评估了 ChatGPT(版本 3.5 和 4)与内科住院医师以及内科和心脏病学主治医生一起回答 MKSAP 心血管医学章节中 98 个多项选择题(MCQ)的表现。ChatGPT-4 的准确率为 74.5%,与内科住院医师(63.3%)、高级住院医师(63.3%)、内科主治医生(62.2%)和 ChatGPT-3.5(64.3%)相当,但明显低于心脏病学主治医生(85.7%)。子类别分析显示,ChatGPT 与医生之间没有统计学差异,除了在瓣膜性心脏病方面,心脏病学主治医生的表现优于 ChatGPT(p=0.031,适用于版本 3.5),以及心力衰竭方面,ChatGPT-4 的表现优于高级住院医师(p=0.046)。
虽然 ChatGPT 在某些子类别中表现出了一定的潜力,但要使 AI 成为医学专业人员可靠的教育工具,其性能可能需要超过教师的准确性,理想情况下,在提出的问题上达到近乎完美的分数。