提高健康素养：评估经 ChatGPT 大型语言模型修订的患者手册的可读性。

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model.

机构信息

Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona, USA.

Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA.

出版信息

Otolaryngol Head Neck Surg. 2024 Dec;171(6):1751-1757. doi: 10.1002/ohn.927. Epub 2024 Aug 6.

DOI:10.1002/ohn.927

PMID:39105460

Abstract

OBJECTIVE

To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts.

STUDY DESIGN

Review of online material modified by AI.

SETTING

Academic center.

METHODS

Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests.

RESULTS

The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials.

CONCLUSION

Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials.

LEVEL OF EVIDENCE

Level VI.

摘要

目的

利用人工智能（AI）驱动的大型语言模型（LLM）提高患者手册的可读性。

研究设计

对 AI 修改后的在线材料进行回顾。

设置

学术中心。

方法

从美国鼻科学会（ARS）和美国面部整形重建外科学会网站获得的 5 份患者手册材料，使用经过验证的可读性指标进行评估。在提示“将以下内容重写为 6 年级阅读水平”后，将这些手册输入 OpenAI 的 ChatGPT-4。使用患者教育材料评估工具（PEMAT）评估原始版本和 LLM 修订版本的理解性和可操作性。使用 Wilcoxon 秩和检验比较结果。

结果

标准（ARS、美国面部整形重建外科学会）材料的平均可读性分数对应于“困难”，阅读类别介于高中和大学水平之间。相反，经过 LLM 修订的手册平均为 7 年级阅读水平。在几乎所有测试的指标中，经过 LLM 修订的手册的可读性都更好：弗莱什-金纳阅读舒适度（70.8 比 43.9；P<.05）、冈宁雾度得分（10.2 比 14.42；P<.05）、简单胡说度（9.9 比 13.1；P<.05）、科尔曼-廖（8.8 比 12.6；P<.05）和自动化可读性指数（8.2 比 10.7；P=.06）。与标准材料相比，经过 LLM 修订的手册在理解性方面的 PEMAT 评分显著更高（91 比 74%；P<.05），而可操作性相似（42 比 34%；P=.15）。