• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型能指导主动脉瓣狭窄的管理吗?ChatGPT与Gemini人工智能的比较分析。

Can Large Language Models Guide Aortic Stenosis Management? A Comparative Analysis of ChatGPT and Gemini AI.

作者信息

Sezgin Ali, Tanık Veysel Ozan, Akdoğan Murat, Şahin Yusuf Bozkurt, Akbuğa Kürşat, Hekimsoy Vedat, Tunca Çağatay, Saraçoğlu Erhan, Özlek Bülent

机构信息

Department of Cardiology, Ankara Etlik City Hospital, Ankara, Türkiye.

Department of Cardiology, Muğla Sıtkı Koçman University, School of Medicine, Muğla, Türkiye.

出版信息

Turk Kardiyol Dern Ars. 2025 Sep 8. doi: 10.5543/tkda.2025.54968.

DOI:10.5543/tkda.2025.54968
PMID:40919834
Abstract

OBJECTIVE

Management of aortic stenosis (AS) requires integrating complex clinical, imaging, and risk stratification data. Large language models (LLMs) such as ChatGPT and Gemini AI have shown promise in healthcare, but their performance in valvular heart disease, particularly AS, has not been thoroughly assessed. This study systematically compared ChatGPT and Gemini AI in addressing guideline-based and clinical scenario questions related to AS.

METHOD

Forty open-ended AS-related questions were developed, comprising 20 knowledge-based and 20 clinical scenario items based on the 2021 European Society of Cardiology/European Association for Cardio-Thoracic Surgery (ESC/EACTS) guidelines. Both models were queried independently. Responses were evaluated by two blinded cardiologists using a structured 4-point scoring system. Composite scores were categorized, and comparisons were performed using Wilcoxon signed-rank and chi-square tests.

RESULTS

Gemini AI achieved a significantly higher mean overall score than ChatGPT (3.96 +- 0.17 vs. 3.56 +- 0.87; P = 0.003). Fully guideline-compliant responses were more frequent with Gemini AI (95.0%) than with ChatGPT (72.5%), although the overall compliance distribution difference did not reach conventional significance (P = 0.067). Gemini AI performed more consistently across both question types. Inter-rater agreement was excellent for ChatGPT (κ = 0.94) and moderate for Gemini AI (κ = 0.66).

CONCLUSION

Gemini AI demonstrated superior accuracy, consistency, and guideline adherence compared to ChatGPT. While LLMs show potential as adjunctive tools in cardiovascular care, expert oversight remains essential, and further model refinement is needed before clinical integration, particularly in AS management.

摘要

目的

主动脉瓣狭窄(AS)的管理需要整合复杂的临床、影像学和风险分层数据。ChatGPT和Gemini AI等大型语言模型在医疗保健领域已显示出应用前景,但其在瓣膜性心脏病,尤其是AS方面的表现尚未得到全面评估。本研究系统地比较了ChatGPT和Gemini AI在解决与AS相关的基于指南和临床情景问题方面的能力。

方法

根据2021年欧洲心脏病学会/欧洲心胸外科学会(ESC/EACTS)指南,编制了40个与AS相关的开放式问题,包括20个基于知识的问题和20个临床情景问题。分别对两个模型进行查询。由两名不知情的心脏病专家使用结构化的4分评分系统对回答进行评估。对综合得分进行分类,并使用Wilcoxon符号秩检验和卡方检验进行比较。

结果

Gemini AI的平均总体得分显著高于ChatGPT(3.96±0.17对3.56±0.87;P = 0.003)。Gemini AI给出的完全符合指南的回答比ChatGPT更频繁(95.0%对72.5%),尽管总体符合率分布差异未达到传统显著性水平(P = 0.067)。Gemini AI在两种问题类型上的表现更一致。ChatGPT的评分者间一致性极佳(κ = 0.94),Gemini AI的评分者间一致性中等(κ = 0.66)。

结论

与ChatGPT相比,Gemini AI在准确性、一致性和指南遵循性方面表现更优。虽然大型语言模型在心血管护理中显示出作为辅助工具的潜力,但专家监督仍然至关重要,在临床应用之前,特别是在AS管理方面,还需要进一步完善模型。

相似文献

1
Can Large Language Models Guide Aortic Stenosis Management? A Comparative Analysis of ChatGPT and Gemini AI.大语言模型能指导主动脉瓣狭窄的管理吗?ChatGPT与Gemini人工智能的比较分析。
Turk Kardiyol Dern Ars. 2025 Sep 8. doi: 10.5543/tkda.2025.54968.
2
Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较:人工智能在医学教育中的应用启示
Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.
3
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
4
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
5
How Accurate Is AI? A Critical Evaluation of Commonly Used Large Language Models in Responding to Patient Concerns About Incidental Kidney Tumors.人工智能的准确性如何?对常用大语言模型回应患者对偶然发现的肾肿瘤担忧的批判性评估。
J Clin Med. 2025 Aug 12;14(16):5697. doi: 10.3390/jcm14165697.
6
Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能:比较分析
JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.
7
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
8
Artificial intelligence in asthma health literacy: a comparative analysis of ChatGPT versus Gemini.人工智能在哮喘健康素养中的应用:ChatGPT与Gemini的比较分析
J Asthma. 2025 Apr 26:1-7. doi: 10.1080/02770903.2025.2495729.
9
Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.评估大语言模型在青少年特发性脊柱侧弯护理中的作用:ChatGPT与谷歌Gemini的比较
Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.
10
Evaluating large language models for renal colic imaging recommendations: a comparative analysis of Gemini, copilot, and ChatGPT-4.0.评估用于肾绞痛成像建议的大语言模型:Gemini、Copilot和ChatGPT-4.0的比较分析。
Int J Emerg Med. 2025 Jul 4;18(1):123. doi: 10.1186/s12245-025-00895-3.