Suppr超能文献

大语言模型在中风康复健康教育中的应用:两阶段研究

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.

作者信息

Qiang Shiqi, Zhang Haitao, Liao Yang, Zhang Yue, Gu Yanfen, Wang Yiyan, Xu Zehui, Shi Hui, Han Nuo, Yu Haiping

机构信息

School of Medicine, Tongji University, No. 1800 Yuntai Road, Shanghai, 200120, China, 86 18964538997.

Department of Emergency and Critical Care, Shanghai East Hospital, Shanghai, China.

出版信息

J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.

Abstract

BACKGROUND

Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing a crucial role in improving patient prognosis and quality of life. Traditional health education often lacks precision, personalization, and accessibility. In contrast, large language models (LLMs) are gaining attention for their potential in medical health education, owing to their advanced natural language processing capabilities. However, the effectiveness of LLMs in home-based stroke rehabilitation remains uncertain.

OBJECTIVE

This study evaluates the effectiveness of 4 LLMs-ChatGPT-4, MedGo, Qwen, and ERNIE Bot-selected for their diversity in model type, clinical relevance, and accessibility at the time of study design in home-based stroke rehabilitation. The aim is to offer patients with stroke more precise and secure health education pathways while exploring the feasibility of using LLMs to guide health education.

METHODS

In the first phase of this study, a literature review and expert interviews identified 15 common questions and 2 clinical cases relevant to patients with stroke in home-based rehabilitation. These were input into 4 LLMs for simulated consultations. Six medical experts (2 clinicians, 2 nursing specialists, and 2 rehabilitation therapists) evaluated the LLM-generated responses using a Likert 5-point scale, assessing accuracy, completeness, readability, safety, and humanity. In the second phase, the top 2 performing models from phase 1 were selected. Thirty patients with stroke undergoing home-based rehabilitation were recruited. Each patient asked both models 3 questions, rated the responses using a satisfaction scale, and assessed readability, text length, and recommended reading age using a Chinese readability analysis tool. Data were analyzed using one-way ANOVA, post hoc Tukey Honestly Significant Difference tests, and paired t tests.

RESULTS

The results revealed significant differences across the 4 models in 5 dimensions: accuracy (P=.002), completeness (P<.001), readability (P=.04), safety (P=.007), and humanity (P<.001). ChatGPT-4 outperformed all models in each dimension, with scores for accuracy (mean 4.28, SD 0.84), completeness (mean 4.35, SD 0.75), readability (mean 4.28, SD 0.85), safety (mean 4.38, SD0.81), and user-friendliness (mean 4.65, SD 0.66). MedGo excelled in accuracy (mean 4.06, SD 0.78) and completeness (mean 4.06, SD 0.74). Qwen and ERNIE Bot scored significantly lower across all 5 dimensions than ChatGPT-4 and MedGo. ChatGPT-4 generated the longest responses (mean 1338.35, SD 236.03) and had the highest readability score (mean 12.88). In the second phase, ChatGPT-4 performed the best overall, while MedGo provided the clearest responses.

CONCLUSIONS

LLMs, particularly ChatGPT-4 and MedGo, demonstrated promising performance in home-based stroke rehabilitation education. However, discrepancies between expert and patient evaluations highlight the need for improved alignment with patient comprehension and expectations. Enhancing clinical accuracy, readability, and oversight mechanisms will be essential for future real-world integration.

摘要

背景

中风是全球致残和致死的主要原因,家庭康复对改善患者预后和生活质量起着至关重要的作用。传统健康教育往往缺乏精准性、个性化和可及性。相比之下,大语言模型(LLMs)因其先进的自然语言处理能力,在医学健康教育中的潜力正受到关注。然而,大语言模型在家庭中风康复中的有效性仍不确定。

目的

本研究评估了在研究设计时因模型类型、临床相关性和可及性的多样性而选择的4个大语言模型——ChatGPT-4、MedGo、文心一言和ERNIE Bot在家庭中风康复中的有效性。目的是为中风患者提供更精准、安全的健康教育途径,同时探索使用大语言模型指导健康教育的可行性。

方法

在本研究的第一阶段,通过文献综述和专家访谈确定了15个与家庭康复中的中风患者相关的常见问题和2个临床病例。将这些内容输入4个大语言模型进行模拟咨询。6名医学专家(2名临床医生、2名护理专家和2名康复治疗师)使用李克特5级量表评估大语言模型生成的回答,评估准确性、完整性、可读性、安全性和人文性。在第二阶段,从第一阶段中表现最佳的2个模型中进行选择。招募了30名正在接受家庭康复的中风患者。每位患者向两个模型各提出3个问题,使用满意度量表对回答进行评分,并使用中文可读性分析工具评估可读性、文本长度和推荐阅读年龄。使用单因素方差分析、事后Tukey真实显著性差异检验和配对t检验对数据进行分析。

结果

结果显示,4个模型在5个维度上存在显著差异:准确性(P = 0.002)、完整性(P < 0.001)、可读性(P = 0.04)、安全性(P = 0.007)和人文性(P < 0.001)。ChatGPT-4在每个维度上均优于所有模型,准确性得分(平均4.28,标准差0.84)、完整性得分(平均4.35,标准差0.75)、可读性得分(平均4.28,标准差0.85)、安全性得分(平均4.38,标准差0.81)和用户友好性得分(平均4.65,标准差0.66)。MedGo在准确性(平均4.06,标准差0.78)和完整性(平均4.06,标准差0.74)方面表现出色。文心一言和ERNIE Bot在所有5个维度上的得分均显著低于ChatGPT-4和MedGo。ChatGPT-4生成的回答最长(平均1338.35,标准差236.03),可读性得分最高(平均12.88)。在第二阶段,ChatGPT-4总体表现最佳,而MedGo提供的回答最清晰。

结论

大语言模型,特别是ChatGPT-4和MedGo,在家庭中风康复教育中表现出了有前景的性能。然而,专家和患者评估之间的差异凸显了需要更好地与患者理解和期望保持一致。提高临床准确性、可读性和监督机制对于未来在现实世界中的整合至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e2f/12306586/9693d6dce11d/jmir-v27-e73226-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验