Dihan Qais A, Brown Andrew D, Chauhan Muhammad Z, Alzein Ahmad F, Abdelnaem Seif E, Kelso Sean D, Rahal Dania A, Park Royce, Ashraf Mohammadali, Azzam Amr, Morsi Mahmoud, Warner David B, Sallam Ahmed B, Saeed Hajirah N, Elhusseiny Abdelrahman M
Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL, USA.
Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Eye (Lond). 2025 Apr;39(6):1115-1122. doi: 10.1038/s41433-024-03476-5. Epub 2024 Dec 16.
BACKGROUND/OBJECTIVES: Dry eye disease (DED) is an exceedingly common diagnosis in patients, yet recent analyses have demonstrated patient education materials (PEMs) on DED to be of low quality and readability. Our study evaluated the utility and performance of three large language models (LLMs) in enhancing and generating new patient education materials (PEMs) on dry eye disease (DED).
SUBJECTS/METHODS: We evaluated PEMs generated by ChatGPT-3.5, ChatGPT-4, Gemini Advanced, using three separate prompts. Prompts A and B requested they generate PEMs on DED, with Prompt B specifying a 6th-grade reading level, using the SMOG (Simple Measure of Gobbledygook) readability formula. Prompt C asked for a rewrite of existing PEMs at a 6th-grade reading level. Each PEM was assessed on readability (SMOG, FKGL: Flesch-Kincaid Grade Level), quality (PEMAT: Patient Education Materials Assessment Tool, DISCERN), and accuracy (Likert Misinformation scale).
All LLM-generated PEMs in response to Prompt A and B were of high quality (median DISCERN = 4), understandable (PEMAT understandability ≥70%) and accurate (Likert Score=1). LLM-generated PEMs were not actionable (PEMAT Actionability <70%). ChatGPT-4 and Gemini Advanced rewrote existing PEMs (Prompt C) from a baseline readability level (FKGL: 8.0 ± 2.4, SMOG: 7.9 ± 1.7) to targeted 6th-grade reading level; rewrites contained little to no misinformation (median Likert misinformation=1 (range: 1-2)). However, only ChatGPT-4 rewrote PEMs while maintaining high quality and reliability (median DISCERN = 4).
LLMs (notably ChatGPT-4) were able to generate and rewrite PEMs on DED that were readable, accurate, and high quality. Our study underscores the value of leveraging LLMs as supplementary tools to improving PEMs.
背景/目的:干眼症(DED)是患者中极为常见的诊断,但最近的分析表明,关于干眼症的患者教育材料(PEMs)质量和可读性较低。我们的研究评估了三种大语言模型(LLMs)在增强和生成关于干眼症(DED)的新患者教育材料(PEMs)方面的效用和性能。
对象/方法:我们使用三个不同的提示词评估了由ChatGPT-3.5、ChatGPT-4、Gemini Advanced生成的PEMs。提示词A和B要求它们生成关于干眼症的PEMs,提示词B使用SMOG(简化晦涩度测量)可读性公式指定六年级阅读水平。提示词C要求以六年级阅读水平重写现有PEMs。每个PEM都根据可读性(SMOG、FKGL:弗莱施-金凯德年级水平)、质量(PEMAT:患者教育材料评估工具、DISCERN)和准确性(李克特错误信息量表)进行评估。
所有响应提示词A和B的由大语言模型生成的PEMs质量都很高(中位数DISCERN = 4),易于理解(PEMAT可理解性≥70%)且准确(李克特评分 = 1)。由大语言模型生成的PEMs不具有可操作性(PEMAT可操作性 < 70%)。ChatGPT-4和Gemini Advanced将现有PEMs(提示词C)从基线可读性水平(FKGL:8.0 ± 2.4,SMOG:7.9 ± 1.7)重写到目标六年级阅读水平;重写内容几乎没有错误信息(中位数李克特错误信息 = 1(范围:1 - 2))。然而,只有ChatGPT-4在保持高质量和可靠性的同时重写了PEMs(中位数DISCERN = 4)。
大语言模型(特别是ChatGPT-4)能够生成和重写关于干眼症的PEMs,这些PEMs具有可读性、准确性和高质量。我们的研究强调了利用大语言模型作为改进PEMs的辅助工具的价值。