Suppr超能文献

不同的人工智能语言模型在向患者介绍静脉曲张的射频消融治疗方面效果如何?

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?

作者信息

Zyada Ayman, Fakhry Ayman, Nagib Sohiel, Seken Rahma A, Farrag Mohamed, Abouelseoud Ahmed, Alnadi Omar, Moner Mahmoud, Ghazy Ziad M

机构信息

Vascular Surgery, University Hospitals of Leicester National Health Service (NHS) Trust, Leicester, GBR.

Vascular Surgery, Egyptian Military Medical Academy, Alexandria, EGY.

出版信息

Cureus. 2025 Jun 22;17(6):e86537. doi: 10.7759/cureus.86537. eCollection 2025 Jun.

Abstract

Introduction The rapid integration of artificial intelligence (AI) into healthcare has led to increased public use of large language models (LLMs) to obtain medical information. However, the accuracy and clarity of AI-generated responses to patient queries remain uncertain. This study aims to evaluate and compare the quality of responses provided by five leading AI language models regarding radiofrequency ablation (RFA) for varicose veins. Objective To assess and compare the reliability, clarity, and usefulness of AI-generated answers to frequently asked patient questions about RFA for varicose veins, as evaluated by expert vascular surgeons. Methods A blinded, comparative observational study was conducted using a standardized list of eight frequently asked questions about RFA, derived from reputable vascular surgery centers across multiple countries. Five top-performing, open-access LLMs (ChatGPT-4, OpenAI, San Francisco, CA, USA; DeepSeek-R1, DeepSeek, Hangzhou, Zhejiang, China; Gemini 2.0, Google DeepMind, Mountain View, CA, USA; Grok-3, xAI, San Francisco, CA, USA; and LLaMA 3.1, Meta Platforms, Inc., Menlo Park, CA, USA) were tested. Responses from each model were independently evaluated by 32 experienced vascular surgeons using four criteria: accuracy, clarity, relevance, and depth. Statistical analyses, including Friedman and Wilcoxon signed-rank tests, were used to determine model performance. Results Grok-3 was rated as providing the highest-quality responses in 51.6% of instances, significantly outperforming all other models (p < 0.0001). ChatGPT-4 ranked second with 23.1%. Gemini, DeepSeek, and LLaMA showed comparable but lower performance. Question-specific analysis revealed that Grok-3 dominated responses related to procedural risks and post-procedure care, while ChatGPT-4 performed best in introductory questions. A subgroup analysis showed that user experience level had no significant impact on model preferences. While 42.4% of respondents were willing to recommend AI tools to patients, 45.5% remained uncertain, reflecting ongoing hesitation. Conclusion Grok-3 and ChatGPT-4 currently provide the most reliable AI-generated patient education about RFA for varicose veins. While AI holds promise in improving patient understanding and reducing physician workload, ongoing evaluation and cautious clinical integration are essential. The study establishes a baseline for future comparisons as AI technologies continue to evolve.

摘要

引言 人工智能(AI)迅速融入医疗保健领域,导致公众越来越多地使用大语言模型(LLM)来获取医疗信息。然而,AI对患者问题的回答的准确性和清晰度仍不确定。本研究旨在评估和比较五个领先的AI语言模型提供的关于静脉曲张射频消融(RFA)的回答质量。

目的 由血管外科专家评估,以评估和比较AI生成的针对患者关于静脉曲张RFA常见问题的回答的可靠性、清晰度和实用性。

方法 使用来自多个国家著名血管外科中心的关于RFA的八个常见问题的标准化列表,进行一项双盲、比较性观察研究。测试了五个表现最佳的开放访问LLM(ChatGPT-4,OpenAI,美国加利福尼亚州旧金山;DeepSeek-R1,DeepSeek,中国浙江杭州;Gemini 2.0,谷歌DeepMind,美国加利福尼亚州山景城;Grok-3,xAI,美国加利福尼亚州旧金山;以及LLaMA 3.1,Meta平台公司,美国加利福尼亚州门洛帕克)。32名经验丰富的血管外科医生使用四个标准:准确性、清晰度、相关性和深度,对每个模型的回答进行独立评估。使用包括Friedman和Wilcoxon符号秩检验在内的统计分析来确定模型性能。

结果 Grok-3在51.6%的情况下被评为提供了最高质量的回答,显著优于所有其他模型(p < 0.0001)。ChatGPT-4以23.1%排名第二。Gemini、DeepSeek和LLaMA表现相当但较低。针对特定问题的分析表明,Grok-3在与手术风险和术后护理相关的回答中占主导地位,而ChatGPT-4在介绍性问题上表现最佳。亚组分析表明,用户体验水平对模型偏好没有显著影响。虽然42.4%的受访者愿意向患者推荐AI工具,但45.5%的人仍不确定,这反映出持续的犹豫。

结论 Grok-3和ChatGPT-4目前为静脉曲张RFA提供了最可靠的AI生成的患者教育内容。虽然AI有望提高患者的理解并减轻医生的工作量,但持续评估和谨慎的临床整合至关重要。随着AI技术不断发展,该研究为未来的比较建立了一个基线。

相似文献

6
Large Language Models Demonstrate Distinct Personality Profiles.大语言模型展现出独特的个性特征。
Cureus. 2025 May 23;17(5):e84706. doi: 10.7759/cureus.84706. eCollection 2025 May.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验