不同的人工智能语言模型在向患者介绍静脉曲张的射频消融治疗方面效果如何？

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?

作者信息

Zyada Ayman, Fakhry Ayman, Nagib Sohiel, Seken Rahma A, Farrag Mohamed, Abouelseoud Ahmed, Alnadi Omar, Moner Mahmoud, Ghazy Ziad M

机构信息

Vascular Surgery, University Hospitals of Leicester National Health Service (NHS) Trust, Leicester, GBR.

Vascular Surgery, Egyptian Military Medical Academy, Alexandria, EGY.

出版信息

Cureus. 2025 Jun 22;17(6):e86537. doi: 10.7759/cureus.86537. eCollection 2025 Jun.

DOI:10.7759/cureus.86537

PMID:40698235

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12282550/

Abstract

Introduction The rapid integration of artificial intelligence (AI) into healthcare has led to increased public use of large language models (LLMs) to obtain medical information. However, the accuracy and clarity of AI-generated responses to patient queries remain uncertain. This study aims to evaluate and compare the quality of responses provided by five leading AI language models regarding radiofrequency ablation (RFA) for varicose veins. Objective To assess and compare the reliability, clarity, and usefulness of AI-generated answers to frequently asked patient questions about RFA for varicose veins, as evaluated by expert vascular surgeons. Methods A blinded, comparative observational study was conducted using a standardized list of eight frequently asked questions about RFA, derived from reputable vascular surgery centers across multiple countries. Five top-performing, open-access LLMs (ChatGPT-4, OpenAI, San Francisco, CA, USA; DeepSeek-R1, DeepSeek, Hangzhou, Zhejiang, China; Gemini 2.0, Google DeepMind, Mountain View, CA, USA; Grok-3, xAI, San Francisco, CA, USA; and LLaMA 3.1, Meta Platforms, Inc., Menlo Park, CA, USA) were tested. Responses from each model were independently evaluated by 32 experienced vascular surgeons using four criteria: accuracy, clarity, relevance, and depth. Statistical analyses, including Friedman and Wilcoxon signed-rank tests, were used to determine model performance. Results Grok-3 was rated as providing the highest-quality responses in 51.6% of instances, significantly outperforming all other models (p < 0.0001). ChatGPT-4 ranked second with 23.1%. Gemini, DeepSeek, and LLaMA showed comparable but lower performance. Question-specific analysis revealed that Grok-3 dominated responses related to procedural risks and post-procedure care, while ChatGPT-4 performed best in introductory questions. A subgroup analysis showed that user experience level had no significant impact on model preferences. While 42.4% of respondents were willing to recommend AI tools to patients, 45.5% remained uncertain, reflecting ongoing hesitation. Conclusion Grok-3 and ChatGPT-4 currently provide the most reliable AI-generated patient education about RFA for varicose veins. While AI holds promise in improving patient understanding and reducing physician workload, ongoing evaluation and cautious clinical integration are essential. The study establishes a baseline for future comparisons as AI technologies continue to evolve.

摘要

引言人工智能（AI）迅速融入医疗保健领域，导致公众越来越多地使用大语言模型（LLM）来获取医疗信息。然而，AI对患者问题的回答的准确性和清晰度仍不确定。本研究旨在评估和比较五个领先的AI语言模型提供的关于静脉曲张射频消融（RFA）的回答质量。

目的由血管外科专家评估，以评估和比较AI生成的针对患者关于静脉曲张RFA常见问题的回答的可靠性、清晰度和实用性。

方法使用来自多个国家著名血管外科中心的关于RFA的八个常见问题的标准化列表，进行一项双盲、比较性观察研究。测试了五个表现最佳的开放访问LLM（ChatGPT-4，OpenAI，美国加利福尼亚州旧金山；DeepSeek-R1，DeepSeek，中国浙江杭州；Gemini 2.0，谷歌DeepMind，美国加利福尼亚州山景城；Grok-3，xAI，美国加利福尼亚州旧金山；以及LLaMA 3.1，Meta平台公司，美国加利福尼亚州门洛帕克）。32名经验丰富的血管外科医生使用四个标准：准确性、清晰度、相关性和深度，对每个模型的回答进行独立评估。使用包括Friedman和Wilcoxon符号秩检验在内的统计分析来确定模型性能。

结果 Grok-3在51.6%的情况下被评为提供了最高质量的回答，显著优于所有其他模型（p < 0.0001）。ChatGPT-4以23.1%排名第二。Gemini、DeepSeek和LLaMA表现相当但较低。针对特定问题的分析表明，Grok-3在与手术风险和术后护理相关的回答中占主导地位，而ChatGPT-4在介绍性问题上表现最佳。亚组分析表明，用户体验水平对模型偏好没有显著影响。虽然42.4%的受访者愿意向患者推荐AI工具，但45.5%的人仍不确定，这反映出持续的犹豫。

结论 Grok-3和ChatGPT-4目前为静脉曲张RFA提供了最可靠的AI生成的患者教育内容。虽然AI有望提高患者的理解并减轻医生的工作量，但持续评估和谨慎的临床整合至关重要。随着AI技术不断发展，该研究为未来的比较建立了一个基线。

相似文献

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?不同的人工智能语言模型在向患者介绍静脉曲张的射频消融治疗方面效果如何？

Cureus. 2025 Jun 22;17(6):e86537. doi: 10.7759/cureus.86537. eCollection 2025 Jun.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。

BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.

Endovenous ablation therapy (laser or radiofrequency) or foam sclerotherapy versus conventional surgical repair for short saphenous varicose veins.对于小隐静脉曲张，腔内消融治疗（激光或射频）或泡沫硬化疗法与传统手术修复的比较。

Cochrane Database Syst Rev. 2016 Nov 29;11(11):CD010878. doi: 10.1002/14651858.CD010878.pub2.

Large Language Models Demonstrate Distinct Personality Profiles.大语言模型展现出独特的个性特征。

Cureus. 2025 May 23;17(5):e84706. doi: 10.7759/cureus.84706. eCollection 2025 May.

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能：比较分析

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

Sexual Harassment and Prevention Training性骚扰与预防培训

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息：一项比较研究。

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

本文引用的文献

Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources.评估人工智能聊天机器人在腹主动脉瘤患者教育中的作用：ChatGPT与传统资源的比较

ANZ J Surg. 2025 Apr;95(4):784-788. doi: 10.1111/ans.70053. Epub 2025 Mar 5.

Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy.基于患者和临床医生的大语言模型在前列腺癌放疗患者教育中的评估

Strahlenther Onkol. 2025 Mar;201(3):333-342. doi: 10.1007/s00066-024-02342-3. Epub 2025 Jan 10.

Can artificial intelligence improve patient educational material readability? A systematic review and narrative synthesis.人工智能能否提高患者教育材料的可读性？一项系统评价与叙述性综合分析。

Intern Med J. 2025 Jan;55(1):20-34. doi: 10.1111/imj.16607. Epub 2024 Dec 25.

Enhancing interpretability and accuracy of AI models in healthcare: a comprehensive review on challenges and future directions.提高医疗保健领域人工智能模型的可解释性和准确性：关于挑战与未来方向的全面综述

Front Robot AI. 2024 Nov 28;11:1444763. doi: 10.3389/frobt.2024.1444763. eCollection 2024.

Assessing the quality of ChatGPT's responses to questions related to radiofrequency ablation for varicose veins.评估ChatGPT对与静脉曲张射频消融相关问题的回答质量。

J Vasc Surg Venous Lymphat Disord. 2025 Jan;13(1):101985. doi: 10.1016/j.jvsv.2024.101985. Epub 2024 Sep 25.

The Efficacy of Radiofrequency Ablation for the Treatment of Symptomatic Varicose Veins of Lower Limbs.射频消融治疗下肢症状性静脉曲张的疗效

Vasc Endovascular Surg. 2025 Feb;59(2):121-125. doi: 10.1177/15385744241284876. Epub 2024 Sep 12.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model.血管疾病患者的人工智能生成信息：评估ChatGPT人工智能语言模型提供的特定程序信息标准

Cureus. 2023 Nov 30;15(11):e49764. doi: 10.7759/cureus.49764. eCollection 2023 Nov.

The potential of chatbots in chronic venous disease patient management.聊天机器人在慢性静脉疾病患者管理中的潜力。

JVS Vasc Insights. 2023;1. doi: 10.1016/j.jvsvi.2023.100019. Epub 2023 Jun 19.

A review of familial, genetic, and congenital aspects of primary varicose vein disease.原发性静脉曲张疾病的家族性、遗传性和先天性方面综述。

Circ Cardiovasc Genet. 2012 Aug 1;5(4):460-6. doi: 10.1161/CIRCGENETICS.112.963439.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。