• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在中风康复健康教育中的应用:两阶段研究

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.

作者信息

Qiang Shiqi, Zhang Haitao, Liao Yang, Zhang Yue, Gu Yanfen, Wang Yiyan, Xu Zehui, Shi Hui, Han Nuo, Yu Haiping

机构信息

School of Medicine, Tongji University, No. 1800 Yuntai Road, Shanghai, 200120, China, 86 18964538997.

Department of Emergency and Critical Care, Shanghai East Hospital, Shanghai, China.

出版信息

J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.

DOI:10.2196/73226
PMID:40694436
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12306586/
Abstract

BACKGROUND

Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing a crucial role in improving patient prognosis and quality of life. Traditional health education often lacks precision, personalization, and accessibility. In contrast, large language models (LLMs) are gaining attention for their potential in medical health education, owing to their advanced natural language processing capabilities. However, the effectiveness of LLMs in home-based stroke rehabilitation remains uncertain.

OBJECTIVE

This study evaluates the effectiveness of 4 LLMs-ChatGPT-4, MedGo, Qwen, and ERNIE Bot-selected for their diversity in model type, clinical relevance, and accessibility at the time of study design in home-based stroke rehabilitation. The aim is to offer patients with stroke more precise and secure health education pathways while exploring the feasibility of using LLMs to guide health education.

METHODS

In the first phase of this study, a literature review and expert interviews identified 15 common questions and 2 clinical cases relevant to patients with stroke in home-based rehabilitation. These were input into 4 LLMs for simulated consultations. Six medical experts (2 clinicians, 2 nursing specialists, and 2 rehabilitation therapists) evaluated the LLM-generated responses using a Likert 5-point scale, assessing accuracy, completeness, readability, safety, and humanity. In the second phase, the top 2 performing models from phase 1 were selected. Thirty patients with stroke undergoing home-based rehabilitation were recruited. Each patient asked both models 3 questions, rated the responses using a satisfaction scale, and assessed readability, text length, and recommended reading age using a Chinese readability analysis tool. Data were analyzed using one-way ANOVA, post hoc Tukey Honestly Significant Difference tests, and paired t tests.

RESULTS

The results revealed significant differences across the 4 models in 5 dimensions: accuracy (P=.002), completeness (P<.001), readability (P=.04), safety (P=.007), and humanity (P<.001). ChatGPT-4 outperformed all models in each dimension, with scores for accuracy (mean 4.28, SD 0.84), completeness (mean 4.35, SD 0.75), readability (mean 4.28, SD 0.85), safety (mean 4.38, SD0.81), and user-friendliness (mean 4.65, SD 0.66). MedGo excelled in accuracy (mean 4.06, SD 0.78) and completeness (mean 4.06, SD 0.74). Qwen and ERNIE Bot scored significantly lower across all 5 dimensions than ChatGPT-4 and MedGo. ChatGPT-4 generated the longest responses (mean 1338.35, SD 236.03) and had the highest readability score (mean 12.88). In the second phase, ChatGPT-4 performed the best overall, while MedGo provided the clearest responses.

CONCLUSIONS

LLMs, particularly ChatGPT-4 and MedGo, demonstrated promising performance in home-based stroke rehabilitation education. However, discrepancies between expert and patient evaluations highlight the need for improved alignment with patient comprehension and expectations. Enhancing clinical accuracy, readability, and oversight mechanisms will be essential for future real-world integration.

摘要

背景

中风是全球致残和致死的主要原因,家庭康复对改善患者预后和生活质量起着至关重要的作用。传统健康教育往往缺乏精准性、个性化和可及性。相比之下,大语言模型(LLMs)因其先进的自然语言处理能力,在医学健康教育中的潜力正受到关注。然而,大语言模型在家庭中风康复中的有效性仍不确定。

目的

本研究评估了在研究设计时因模型类型、临床相关性和可及性的多样性而选择的4个大语言模型——ChatGPT-4、MedGo、文心一言和ERNIE Bot在家庭中风康复中的有效性。目的是为中风患者提供更精准、安全的健康教育途径,同时探索使用大语言模型指导健康教育的可行性。

方法

在本研究的第一阶段,通过文献综述和专家访谈确定了15个与家庭康复中的中风患者相关的常见问题和2个临床病例。将这些内容输入4个大语言模型进行模拟咨询。6名医学专家(2名临床医生、2名护理专家和2名康复治疗师)使用李克特5级量表评估大语言模型生成的回答,评估准确性、完整性、可读性、安全性和人文性。在第二阶段,从第一阶段中表现最佳的2个模型中进行选择。招募了30名正在接受家庭康复的中风患者。每位患者向两个模型各提出3个问题,使用满意度量表对回答进行评分,并使用中文可读性分析工具评估可读性、文本长度和推荐阅读年龄。使用单因素方差分析、事后Tukey真实显著性差异检验和配对t检验对数据进行分析。

结果

结果显示,4个模型在5个维度上存在显著差异:准确性(P = 0.002)、完整性(P < 0.001)、可读性(P = 0.04)、安全性(P = 0.007)和人文性(P < 0.001)。ChatGPT-4在每个维度上均优于所有模型,准确性得分(平均4.28,标准差0.84)、完整性得分(平均4.35,标准差0.75)、可读性得分(平均4.28,标准差0.85)、安全性得分(平均4.38,标准差0.81)和用户友好性得分(平均4.65,标准差0.66)。MedGo在准确性(平均4.06,标准差0.78)和完整性(平均4.06,标准差0.74)方面表现出色。文心一言和ERNIE Bot在所有5个维度上的得分均显著低于ChatGPT-4和MedGo。ChatGPT-4生成的回答最长(平均1338.35,标准差236.03),可读性得分最高(平均12.88)。在第二阶段,ChatGPT-4总体表现最佳,而MedGo提供的回答最清晰。

结论

大语言模型,特别是ChatGPT-4和MedGo,在家庭中风康复教育中表现出了有前景的性能。然而,专家和患者评估之间的差异凸显了需要更好地与患者理解和期望保持一致。提高临床准确性、可读性和监督机制对于未来在现实世界中的整合至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/0bf1ad9ccf86/jmir-v27-e73226-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/9693d6dce11d/jmir-v27-e73226-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/e1cc082bb7a7/jmir-v27-e73226-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/aceca0ab754d/jmir-v27-e73226-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/0bf1ad9ccf86/jmir-v27-e73226-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/9693d6dce11d/jmir-v27-e73226-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/e1cc082bb7a7/jmir-v27-e73226-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/aceca0ab754d/jmir-v27-e73226-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/0bf1ad9ccf86/jmir-v27-e73226-g004.jpg

相似文献

1
Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.大语言模型在中风康复健康教育中的应用:两阶段研究
J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.
2
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
3
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究
J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.
4
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.
5
Large Language Model-Assisted Surgical Consent Forms in Non-English Language: Content Analysis and Readability Evaluation.非英语语言的大语言模型辅助手术同意书:内容分析与可读性评估
J Med Internet Res. 2025 Jun 19;27:e73222. doi: 10.2196/73222.
6
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
7
Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity.人工智能简化信息以促进生殖遗传知识普及和健康公平。
Hum Reprod. 2025 Jul 22. doi: 10.1093/humrep/deaf135.
8
Can artificial intelligence improve the readability of patient education information in gynecology?人工智能能否提高妇科患者教育信息的可读性?
Am J Obstet Gynecol. 2025 Jun 25. doi: 10.1016/j.ajog.2025.06.047.
9
Comparison of preoperative education by artificial intelligence versus traditional physicians in perioperative management of urolithiasis surgery: a prospective single-blind randomized controlled trial conducted in China.人工智能与传统医生进行术前教育在尿路结石手术围手术期管理中的比较:一项在中国进行的前瞻性单盲随机对照试验。
Front Med (Lausanne). 2025 Jun 25;12:1543630. doi: 10.3389/fmed.2025.1543630. eCollection 2025.
10
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估:质量分析
JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.

本文引用的文献

1
Large Language Models in Summarizing Radiology Report Impressions for Lung Cancer in Chinese: Evaluation Study.大型语言模型对中文肺癌放射学报告印象的总结:评估研究
J Med Internet Res. 2025 Apr 3;27:e65547. doi: 10.2196/65547.
2
GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study.GPT-4作为缺血性卒中管理中的临床决策支持工具:评估研究
JMIR AI. 2025 Mar 7;4:e60391. doi: 10.2196/60391.
3
Stroke Diagnosis and Prediction Tool Using ChatGLM: Development and Validation Study.使用ChatGLM的中风诊断与预测工具:开发与验证研究
J Med Internet Res. 2025 Feb 26;27:e67010. doi: 10.2196/67010.
4
Assessment of Reperfusion Efficacy of Altelyse Versus Actilyse in Patients with Acute Myocardial Infarction: A Phase 3, Randomized, Double-Blinded, Non-inferiority Clinical Trial.阿替利司与阿替普酶治疗急性心肌梗死患者再灌注疗效的评估:一项3期随机双盲非劣效性临床试验。
Clin Drug Investig. 2025 Feb;45(2):101-110. doi: 10.1007/s40261-025-01420-3. Epub 2025 Jan 28.
5
Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.Qwen-2.5在中国国家护士执业资格考试中表现优于其他大语言模型:回顾性横断面比较研究。
JMIR Med Inform. 2025 Jan 10;13:e63731. doi: 10.2196/63731.
6
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
7
Large Language Models May Help Patients Understand Peer-Reviewed Scientific Articles About Ophthalmology: Development and Usability Study.大语言模型可能有助于患者理解关于眼科的同行评审科学文章:开发与可用性研究。
J Med Internet Res. 2024 Dec 24;26:e59843. doi: 10.2196/59843.
8
Leveraging Large Language Models for Improved Understanding of Communications With Patients With Cancer in a Call Center Setting: Proof-of-Concept Study.在呼叫中心环境中利用大语言模型增进对癌症患者沟通的理解:概念验证研究
J Med Internet Res. 2024 Dec 11;26:e63892. doi: 10.2196/63892.
9
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
10
The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.人工智能解决方案在医疗检查和证书中的准确性和能力:系统评价和荟萃分析。
J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.