Dihan Qais A, Brown Andrew D, Zaldivar Ana T, Chauhan Muhammad Z, Eleiwa Taher K, Hassan Amr K, Solyman Omar, Gise Ryan, Phillips Paul H, Sallam Ahmed B, Elhusseiny Abdelrahman M
Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt.
Neurol Clin Pract. 2025 Feb;15(1):e200366. doi: 10.1212/CPJ.0000000000200366. Epub 2024 Oct 8.
We evaluated the performance of 3 large language models (LLMs) in generating patient education materials (PEMs) and enhancing the readability of prewritten PEMs on idiopathic intracranial hypertension (IIH).
This cross-sectional comparative study compared 3 LLMs, ChatGPT-3.5, ChatGPT-4, and Google Bard, for their ability to generate PEMs on IIH using 3 prompts. Prompt A (control prompt): "Can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", Prompt B (modifier statement + control prompt): "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", and Prompt C: "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th-grade reading level: []." We compared generated and rewritten PEMs, along with the first 20 googled eligible PEMs on IIH, on readability (Simple Measure of Gobbledygook [SMOG] and Flesch-Kincaid Grade Level [FKGL]), quality (DISCERN and Patient Education Materials Assessment tool [PEMAT]), and accuracy (Likert misinformation scale).
Generated PEMs were of high quality, understandability, and accuracy (median DISCERN score ≥4, PEMAT understandability ≥70%, Likert misinformation scale = 1). Only ChatGPT-4 was able to generate PEMs at the specified 6th-grade reading level (SMOG: 5.5 ± 0.6, FKGL: 5.6 ± 0.7). Original published PEMs were rewritten to below a 6th-grade reading level with Prompt C, without a decrease in quality, understandability, or accuracy only by ChatGPT-4 (SMOG: 5.6 ± 0.6, FKGL: 5.7 ± 0.8, < 0.001, DISCERN ≥4, Likert misinformation = 1).
In conclusion, LLMs, particularly ChatGPT-4, can produce high-quality, readable PEMs on IIH. They can also serve as supplementary tools to improve the readability of prewritten PEMs while maintaining quality and accuracy.
我们评估了3种大型语言模型(LLMs)在生成患者教育材料(PEMs)以及提高关于特发性颅内高压(IIH)的预写PEMs的可读性方面的性能。
这项横断面比较研究比较了3种大型语言模型,即ChatGPT-3.5、ChatGPT-4和谷歌巴德,它们使用3个提示生成关于IIH的PEMs的能力。提示A(对照提示):“你能写一份针对患者的关于特发性颅内高压的健康信息手册吗?普通美国人能够轻松理解这份手册。”提示B(修饰语句+对照提示):“鉴于建议将患者教育材料写成六年级阅读水平,使用烟雾可读性公式,你能写一份针对患者的关于特发性颅内高压的健康信息手册吗?普通美国人能够轻松理解这份手册。”以及提示C:“鉴于建议将患者教育材料写成六年级阅读水平,使用烟雾可读性公式,你能将以下文本改写为六年级阅读水平吗:[]。”我们比较了生成的和改写的PEMs,以及在谷歌上搜索到的前20份符合条件的关于IIH的PEMs,比较内容包括可读性(简易费解度测量法[SMOG]和弗莱施-金凯德年级水平[FKGL])、质量(辨别度和患者教育材料评估工具[PEMAT])和准确性(李克特错误信息量表)。
生成的PEMs质量高、易懂且准确(辨别度中位数得分≥4,PEMAT易懂度≥70%,李克特错误信息量表=1)。只有ChatGPT-4能够生成指定的六年级阅读水平的PEMs(SMOG:5.5±0.6,FKGL:5.6±0.7)。原始发表的PEMs通过提示C被改写为低于六年级阅读水平,只有ChatGPT-4做到了在不降低质量、易懂度或准确性的情况下改写(SMOG:5.6±0.6,FKGL:5.7±0.8,<0.001,辨别度≥4,李克特错误信息=1)。
总之,大型语言模型,尤其是ChatGPT-4,能够生成关于IIH的高质量、易读的PEMs。它们还可以作为辅助工具,在保持质量和准确性的同时提高预写PEMs的可读性。