Dihan Qais A, Brown Andrew D, Zaldivar Ana T, Montgomery Kendall E, Chauhan Muhammad Z, Abdelnaem Seif E, Ali Arsalan A, Jabbehdari Sayena, Azzam Amr, Sallam Ahmed B, Elhusseiny Abdelrahman M
J Pediatr Ophthalmol Strabismus. 2025 Jun 27:1-10. doi: 10.3928/01913913-20250515-01.
To evaluate the efficacy of large language models (LLMs) in generating patient education materials (PEMs) on retinopathy of prematurity (ROP).
ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Gemini (Google AI) were compared on three separate prompts. Prompt A requested that each LLM generate a novel PEM on ROP. Prompt B requested generated PEMs at the 6th-grade reading level using the validated Simple Measure of Gobbledygook (SMOG) readability formula. Prompt C requested LLMs improve the readability of existing, human-written PEMs to a 6th-grade reading level. PEMs inserted into Prompt C were sourced through a Google search of "retinopathy of prematurity." Each PEM was analyzed for readability (SMOG, Flesch-Kincaid Grade Level [FKGL]), quality (Patient Education Materials Assessment Tool [PEMAT], DISCERN), and accuracy (Likert Misinformation Scale).
LLM-generated PEMs were of high quality (median DISCERN = 4), understandable (PEMAT-U ≥ 70%), and accurate (Likert = 1). Prompt B generated more readable PEMs than Prompt A ( < .001). ChatGPT-4 and Gemini rewrote PEMs (Prompt C) from a baseline readability level (FKGL: 8.8 ± 1.9, SMOG: 8.6 ± 1.5) to the targeted 6th-grade reading level. Only ChatGPT-4 rewrites maintained high quality and reliability (median DISCERN = 4).
LLMs, particularly ChatGPT-4, can serve as strong supplementary tools to automate the process of generating readable and high-quality PEMs for parents on ROP. .
评估大语言模型(LLMs)在生成早产儿视网膜病变(ROP)患者教育材料(PEMs)方面的效果。
在三个不同的提示下对ChatGPT-3.5(OpenAI)、ChatGPT-4(OpenAI)和Gemini(谷歌人工智能)进行比较。提示A要求每个大语言模型生成一篇关于ROP的全新PEM。提示B要求使用经过验证的“晦涩语言简易度量法”(SMOG)可读性公式生成六年级阅读水平的PEM。提示C要求大语言模型将现有的人工撰写的PEM的可读性提高到六年级阅读水平。插入提示C的PEM是通过在谷歌上搜索“早产儿视网膜病变”获得的。对每篇PEM进行可读性(SMOG、弗莱施-金凯德年级水平[FKGL])、质量(患者教育材料评估工具[PEMAT]、DISCERN)和准确性(李克特错误信息量表)分析。
大语言模型生成的PEM质量高(DISCERN中位数=4)、易于理解(PEMAT-U≥70%)且准确(李克特=1)。提示B生成的PEM比提示A更具可读性(P<0.001)。ChatGPT-4和Gemini将PEM(提示C)从基线可读性水平(FKGL:8.8±1.9,SMOG:8.6±1.5)改写为目标六年级阅读水平。只有ChatGPT-4的改写保持了高质量和可靠性(DISCERN中位数=4)。
大语言模型,尤其是ChatGPT-4,可以作为强大的辅助工具,自动为家长生成关于ROP的可读性强且高质量的PEM。