人工智能在心脏治疗决策中的应用：ChatGPT与心脏团队在冠状动脉血运重建中的性能评估

Artificial Intelligence in Cardiac Treatment Decision-Making: An Evaluation of the Performance of ChatGPT Versus the Heart Team in Coronary Revascularization.

作者信息

Mola Serkan, Yıldırım Alp, Gül Enis Burak

机构信息

Cardiovascular Surgery Department, Ankara Bilkent City Hospital, 06800 Ankara, Turkey.

Cardiovascular Surgery Department, Ankara Atatürk Sanatoryum Training and Research Hospital, 06290 Ankara, Turkey.

出版信息

Rev Cardiovasc Med. 2025 Aug 19;26(8):38705. doi: 10.31083/RCM38705. eCollection 2025 Aug.

DOI:10.31083/RCM38705

PMID:40927103

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12415735/

Abstract

BACKGROUND

This study aimed to investigate the performance of two versions of ChatGPT (o1 and 4o) in making decisions about coronary revascularization and to compare the recommendations of these versions with those of a multidisciplinary Heart Team. Moreover, the study aimed to assess whether the decisions generated by ChatGPT, based on the internal knowledge base of the system and clinical guidelines, align with expert recommendations in real-world coronary artery disease management. Given the increasing prevalence and processing capabilities of large language models, such as ChatGPT, this comparison offers insights into the potential applicability of these systems in complex clinical decision-making.

METHODS

We conducted a retrospective study at a single center, which included 128 patients who underwent coronary angiography between August and September 2024. The demographics, medical history, current medications, echocardiographic findings, and angiographic findings for each patient were provided to the two ChatGPT versions. The two models were then asked to choose one of three treatment options: coronary artery bypass grafting (CABG), percutaneous coronary intervention (PCI), or medical therapy, and to justify their choice. Performance was assessed using metrics such as accuracy, sensitivity, specificity, precision, F1 score, Cohen's kappa, and Shannon's entropy.

RESULTS

The Heart Team recommended CABG for 78.1% of the patients, PCI for 12.5%, and medical therapy for 9.4%. ChatGPT o1 demonstrated higher sensitivity in identifying patients who needed CABG (82%) but lower sensitivity for PCI (43.7%), whereas ChatGPT 4o performed better in recognizing PCI candidates (68.7%) but was less accurate for CABG cases (43%). Both models struggled to identify patients suitable for medical therapy, with no correct predictions in this category. Agreement with the Heart Team was low (Cohen's kappa: 0.17 for o1 and 0.03 for 4o). Notably, these errors were often attributed to the limited understanding of the model in a clinical context and the inability to analyze angiographic images directly.

CONCLUSION

While ChatGPT-based artificial intelligence (AI) models show promise in assisting with cardiac care decisions, the current limitations of these models emphasize the need for further development. Incorporating imaging data and enhancing comprehension of clinical context is essential to improve the reliability of these AI models in real-world medical settings.

摘要

背景

本研究旨在调查两个版本的ChatGPT（版本1和版本4）在冠状动脉血运重建决策方面的表现，并将这些版本的建议与多学科心脏团队的建议进行比较。此外，该研究旨在评估ChatGPT基于系统内部知识库和临床指南生成的决策是否与现实世界中冠状动脉疾病管理的专家建议一致。鉴于诸如ChatGPT等大型语言模型的普及率和处理能力不断提高，这种比较为这些系统在复杂临床决策中的潜在适用性提供了见解。

方法

我们在一个单一中心进行了一项回顾性研究，纳入了2024年8月至9月期间接受冠状动脉造影的128例患者。将每位患者的人口统计学、病史、当前用药情况、超声心动图检查结果和血管造影检查结果提供给两个ChatGPT版本。然后要求这两个模型从三种治疗选择中选择一种：冠状动脉旁路移植术（CABG）、经皮冠状动脉介入治疗（PCI）或药物治疗，并为其选择提供理由。使用准确性、敏感性、特异性、精确性、F1分数、科恩kappa系数和香农熵等指标评估表现。

结果

心脏团队建议78.1%的患者进行CABG，12.5%的患者进行PCI，9.4%的患者进行药物治疗。ChatGPT版本1在识别需要CABG的患者方面表现出较高的敏感性（82%），但对PCI的敏感性较低（43.7%），而ChatGPT版本4在识别PCI候选患者方面表现更好（68.7%），但对CABG病例的准确性较低（43%）。两个模型都难以识别适合药物治疗的患者，在这一类别中没有正确预测。与心脏团队的一致性较低（版本1的科恩kappa系数为0.17，版本4为0.03）。值得注意的是，这些错误往往归因于模型在临床背景下的理解有限以及无法直接分析血管造影图像。

结论

虽然基于ChatGPT的人工智能（AI）模型在协助心脏护理决策方面显示出前景，但这些模型目前的局限性强调了进一步开发的必要性。纳入成像数据并增强对临床背景的理解对于提高这些AI模型在现实世界医疗环境中的可靠性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3120/12415735/03b78d58ea1c/2153-8174-26-8-38705-g1.jpg

相似文献

Artificial Intelligence in Cardiac Treatment Decision-Making: An Evaluation of the Performance of ChatGPT Versus the Heart Team in Coronary Revascularization.人工智能在心脏治疗决策中的应用：ChatGPT与心脏团队在冠状动脉血运重建中的性能评估

Rev Cardiovasc Med. 2025 Aug 19;26(8):38705. doi: 10.31083/RCM38705. eCollection 2025 Aug.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparing AI-Driven and Heart Team Decision-Making in Multivessel Coronary Artery Disease.多支冠状动脉疾病中人工智能驱动决策与心脏团队决策的比较

J Clin Med. 2025 Jun 23;14(13):4452. doi: 10.3390/jcm14134452.

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较：人工智能在医学教育中的应用启示

Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.

ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较：一项双盲、随机非劣效性对照试验。

Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.

Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study.利用 ChatGPT-4 从医患对话的音频记录中创建结构化的医疗记录：比较研究。

J Med Internet Res. 2024 Apr 22;26:e54419. doi: 10.2196/54419.

本文引用的文献

Bridging the gap: Evaluating ChatGPT-generated, personalized, patient-centered prostate biopsy reports.弥合差距：评估由ChatGPT生成的、个性化的、以患者为中心的前列腺活检报告。

Am J Clin Pathol. 2025 May 17;163(5):766-774. doi: 10.1093/ajcp/aqae185.

Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios.人工智能可促进风险分层算法在膀胱癌患者病例场景中的应用。

Clin Med Insights Oncol. 2024 Nov 17;18:11795549241296781. doi: 10.1177/11795549241296781. eCollection 2024.

Utilizing large language models in breast cancer management: systematic review.利用大型语言模型进行乳腺癌管理：系统评价。

J Cancer Res Clin Oncol. 2024 Mar 19;150(3):140. doi: 10.1007/s00432-024-05678-6.

Artificial intelligence chatbot and Academy Preferred Practice Pattern ® Guidelines on cataract and glaucoma.人工智能聊天机器人与白内障和青光眼的学会首选实践模式指南

J Cataract Refract Surg. 2024 May 1;50(5):534-535. doi: 10.1097/j.jcrs.0000000000001317. Epub 2024 Mar 7.

THE ABILITY OF ARTIFICIAL INTELLIGENCE CHATBOTS ChatGPT AND GOOGLE BARD TO ACCURATELY CONVEY PREOPERATIVE INFORMATION FOR PATIENTS UNDERGOING OPHTHALMIC SURGERIES.人工智能聊天机器人 ChatGPT 和谷歌巴德准确传达接受眼科手术患者术前信息的能力。

Retina. 2024 Jun 1;44(6):950-953. doi: 10.1097/IAE.0000000000004044.

Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making.在乳腺病学中挑战ChatGPT 3.5——与乳腺癌肿瘤委员会决策的一致性评估

J Pers Med. 2023 Oct 16;13(10):1502. doi: 10.3390/jpm13101502.

Use of large language models for evidence-based cardiovascular medicine.大语言模型在循证心血管医学中的应用。

Eur Heart J Digit Health. 2023 Jul 17;4(5):368-369. doi: 10.1093/ehjdh/ztad041. eCollection 2023 Oct.

Revolutionizing healthcare: the role of artificial intelligence in clinical practice.人工智能在临床实践中的应用：医疗保健的革命。

BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Structure of multidisciplinary heart teams, a survey-based heart team study.多学科心脏团队的结构，一项基于调查的心脏团队研究。

Interdiscip Cardiovasc Thorac Surg. 2023 Aug 3;37(2). doi: 10.1093/icvts/ivad134.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能在心脏治疗决策中的应用：ChatGPT与心脏团队在冠状动脉血运重建中的性能评估

Artificial Intelligence in Cardiac Treatment Decision-Making: An Evaluation of the Performance of ChatGPT Versus the Heart Team in Coronary Revascularization.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献