Cheng Miaomiao, Zhang Qi, Liang Hua, Wang Yanan, Qin Jun, Gong Lei, Wang Sha, Li Luyao, Xiao Xiaoyan
Qilu Hospital of Shandong University, Department of Nephrology, Jinan, Shandong, China.
Healthcare Big Data Research Institute, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China.
Front Endocrinol (Lausanne). 2025 Apr 22;16:1559265. doi: 10.3389/fendo.2025.1559265. eCollection 2025.
Diabetic kidney disease (DKD) is a common and serious complication of diabetes mellitus and has become the most important cause of end-stage renal disease (ESRD). In light of the rising prevalence of diabetes, there is a growing imperative for the early detection and intervention of DKD. With the rapid development of artificial intelligence (AI) technologies, its potential applications in patient education are receiving increasing attention, especially large language models (LLMs). The aim of this study was to evaluate the quality of LLMs-generated patient education materials (PEMs) for early DKD and to explore its feasibility in patient education.
Four LLMs (ERNIE Bot 4.0, GPT-4o, ChatGLM4, and ChatGPT-o1) were selected for this study to generate PEMs. Among them, ERNIE Bot 4.0, GPT-4o, and ChatGLM4 generated 2 versions of PEMs based on American Diabetes Association(ADA) guidelines and without ADA guidelines, respectively. ChatGPT-o1 only generated a PEM without ADA guidelines. An experienced physician wrote a PEM based on ADA guidelines. All materials were assessed using a Likert scale which covered the dimensions of accuracy, completeness, safety, and patient comprehensibility. A total of 7 medical experts (including nephrologists and endocrinologists) and 50 diabetic patients were invited to evaluate the study. We recorded basic information on the patient evaluators.
Experts evaluated PEMs from ERNIE Bot 4.0, GPT-4o, ChatGLM4, and ChatGPT-o1, plus physician-sourced PEM. Results showed ERNIE Bot 4.0's non-guideline PEM and physician-sourced PEM were the top two. Patient assessments of the 2 top-scoring PEMs found that the ERNIE Bot 4.0's non-guideline PEM performed as well as, if not slightly better than, the physician-sourced PEM in terms of patient comprehensibility, completeness, and safety. In addition, the non-guideline-based PEM was preferred for patients with a history of diabetes longer than 5 years and for patients with proteinuria. Surprisingly, GPT-4o and ChatGLM4's non-guideline PEMs outperformed guideline-based ones.
The LLMs-sourced PEMs, especially the ERNIE Bot 4.0's non-guideline PEM for early DKD, performed comparably to the physician-sourced PEM in terms of accuracy, completeness, safety, and patient comprehensibility, and exerted a high degree of feasibility. AI may show the potential for broader applications in patient education in the near future.
糖尿病肾病(DKD)是糖尿病常见且严重的并发症,已成为终末期肾病(ESRD)的最重要原因。鉴于糖尿病患病率不断上升,早期发现和干预DKD的紧迫性日益增加。随着人工智能(AI)技术的快速发展,其在患者教育中的潜在应用受到越来越多的关注,尤其是大语言模型(LLMs)。本研究的目的是评估大语言模型生成的早期DKD患者教育材料(PEMs)的质量,并探讨其在患者教育中的可行性。
本研究选择了四个大语言模型(文心一言4.0、GPT-4o、ChatGLM4和ChatGPT-o1)来生成PEMs。其中,文心一言4.0、GPT-4o和ChatGLM4分别根据美国糖尿病协会(ADA)指南和不参考ADA指南生成了两个版本的PEMs。ChatGPT-o1仅生成了一个不参考ADA指南的PEM。一位经验丰富的医生根据ADA指南撰写了一份PEM。所有材料均使用李克特量表进行评估,该量表涵盖准确性、完整性、安全性和患者可理解性等维度。共邀请了7名医学专家(包括肾病学家和内分泌学家)和50名糖尿病患者参与评估。我们记录了患者评估者的基本信息。
专家们评估了来自文心一言4.0、GPT-4o、ChatGLM4和ChatGPT-o1的PEMs,以及医生提供来源的PEM。结果显示文心一言4.0的非指南PEM和医生提供来源的PEM位列前两名。对得分最高的两份PEM进行患者评估发现,就患者可理解性、完整性和安全性而言,文心一言4.0的非指南PEM即便不比医生提供来源的PEM略胜一筹,也表现相当。此外,对于糖尿病病史超过5年的患者和有蛋白尿的患者,更倾向于选择非基于指南的PEM。令人惊讶的是,GPT-4o和ChatGLM4的非指南PEM表现优于基于指南的PEM。
大语言模型生成来源的PEMs,尤其是文心一言4.0针对早期DKD的非指南PEM,在准确性、完整性、安全性和患者可理解性方面与医生提供来源的PEM表现相当,且具有高度可行性。人工智能在不久的将来可能在患者教育中展现出更广泛应用的潜力。