West China Medical School, Sichuan University, Chengdu, China.
Department of Neurology, West China Hospital, Sichuan University, Chengdu, China.
J Med Internet Res. 2024 Oct 31;26:e51095. doi: 10.2196/51095.
Alzheimer's disease (AD) is a progressive neurodegenerative disorder posing challenges to patients, caregivers, and society. Accessible and accurate information is crucial for effective AD management.
This study aimed to evaluate the accuracy, comprehensibility, clarity, and usefulness of the Generative Pretrained Transformer's (GPT) answers concerning the management and caregiving of patients with AD.
In total, 14 questions related to the prevention, treatment, and care of AD were identified and posed to GPT-3.5 and GPT-4 in Chinese and English, respectively, and 4 respondent neurologists were asked to answer them. We generated 8 sets of responses (total 112) and randomly coded them in answer sheets. Next, 5 evaluator neurologists and 5 family members of patients were asked to rate the 112 responses using separate 5-point Likert scales. We evaluated the quality of the responses using a set of 8 questions rated on a 5-point Likert scale. To gauge comprehensibility and participant satisfaction, we included 3 questions dedicated to each aspect within the same set of 8 questions.
As of April 10, 2023, the 5 evaluator neurologists and 5 family members of patients with AD rated the 112 responses: GPT-3.5: n=28, 25%, responses; GPT-4: n=28, 25%, responses; respondent neurologists: 56 (50%) responses. The top 5 (4.5%) responses rated by evaluator neurologists had 4 (80%) GPT (GPT-3.5+GPT-4) responses and 1 (20%) respondent neurologist's response. For the top 5 (4.5%) responses rated by patients' family members, all but the third response were GPT responses. Based on the evaluation by neurologists, the neurologist-generated responses achieved a mean score of 3.9 (SD 0.7), while the GPT-generated responses scored significantly higher (mean 4.4, SD 0.6; P<.001). Language and model analyses revealed no significant differences in response quality between the GPT-3.5 and GPT-4 models (GPT-3.5: mean 4.3, SD 0.7; GPT-4: mean 4.4, SD 0.5; P=.51). However, English responses outperformed Chinese responses in terms of comprehensibility (Chinese responses: mean 4.1, SD 0.7; English responses: mean 4.6, SD 0.5; P=.005) and participant satisfaction (Chinese responses: mean 4.2, SD 0.8; English responses: mean 4.5, SD 0.5; P=.04). According to the evaluator neurologists' review, Chinese responses had a mean score of 4.4 (SD 0.6), whereas English responses had a mean score of 4.5 (SD 0.5; P=.002). As for the family members of patients with AD, no significant differences were observed between GPT and neurologists, GPT-3.5 and GPT-4, or Chinese and English responses.
GPT can provide patient education materials on AD for patients, their families and caregivers, nurses, and neurologists. This capability can contribute to the effective health care management of patients with AD, leading to enhanced patient outcomes.
阿尔茨海默病(AD)是一种进行性神经退行性疾病,给患者、照护者和社会带来挑战。可及且准确的信息对于 AD 的有效管理至关重要。
本研究旨在评估生成式预训练转换器(GPT)在回答 AD 患者的管理和照护方面的准确性、可理解性、清晰度和有用性。
共确定了 14 个关于 AD 的预防、治疗和照护的问题,并分别以中文和英文向 GPT-3.5 和 GPT-4 提出问题,4 名应答神经科医生被要求回答这些问题。我们生成了 8 组回答(共 112 个),并在答题卡上随机编码。接下来,5 名评估神经科医生和 5 名 AD 患者的家属被要求使用单独的 5 分李克特量表对 112 个回答进行评分。我们使用一套 8 个问题,每个问题的评价等级为 5 分,来评估回答的质量。为了评估可理解性和参与者满意度,我们在同一套 8 个问题中各包含 3 个问题。
截至 2023 年 4 月 10 日,5 名评估神经科医生和 5 名 AD 患者家属对 112 个回答进行了评分:GPT-3.5:n=28,25%,回答;GPT-4:n=28,25%,回答;应答神经科医生:56(50%)回答。评估神经科医生评分最高的前 5 名(4.5%)回答中有 4 个(80%)GPT(GPT-3.5+GPT-4)回答和 1 个(20%)应答神经科医生的回答。对于患者家属评分最高的前 5 名(4.5%)回答,除了第三名回答外,其余都是 GPT 回答。根据神经科医生的评估,神经科医生生成的回答平均得分为 3.9(SD 0.7),而 GPT 生成的回答得分显著更高(平均 4.4,SD 0.6;P<.001)。语言和模型分析显示,GPT-3.5 和 GPT-4 模型的回答质量没有显著差异(GPT-3.5:平均 4.3,SD 0.7;GPT-4:平均 4.4,SD 0.5;P=.51)。然而,英语回答在可理解性(中文回答:平均 4.1,SD 0.7;英语回答:平均 4.6,SD 0.5;P=.005)和参与者满意度(中文回答:平均 4.2,SD 0.8;英语回答:平均 4.5,SD 0.5;P=.04)方面优于中文回答。根据评估神经科医生的回顾,中文回答的平均得分为 4.4(SD 0.6),而英语回答的平均得分为 4.5(SD 0.5;P=.002)。至于 AD 患者的家属,GPT 和神经科医生、GPT-3.5 和 GPT-4、中文和英文回答之间没有显著差异。
GPT 可以为 AD 患者、其家属和照护者、护士和神经科医生提供 AD 患者教育材料。这一能力有助于有效管理 AD 患者的医疗保健,从而改善患者的预后。