Delsoz Mohammad, Hassan Amr, Nabavi Amin, Rahdar Amir, Fowler Brian, Kerr Natalie C, Ditta Lauren Claire, Hoehn Mary E, DeAngelis Margaret M, Grzybowski Andrzej, Tham Yih-Chung, Yousefi Siamak
Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, 930 Madison Ave., Suite 471, Memphis, TN, 38163, USA.
Department of Ophthalmology, Gavin Herbert Eye Institute, University of California, Irvine, CA, USA.
Ophthalmol Ther. 2025 Jun;14(6):1281-1295. doi: 10.1007/s40123-025-01142-x. Epub 2025 Apr 21.
INTRODUCTION: This study aimed to evaluate the performance of three large language models (LLMs), namely ChatGPT-3.5, ChatGPT-4o (o1 Preview), and Google Gemini, in producing patient education materials (PEMs) and improving the readability of online PEMs on childhood myopia. METHODS: LLM-generated responses were assessed using three prompts. Prompt A requested to "Write educational material on childhood myopia." Prompt B added a modifier specifying "a sixth-grade reading level using the FKGL (Flesch-Kincaid Grade Level) readability formula." Prompt C aimed to rewrite existing PEMs to a sixth-grade level using FKGL. Reponses were assessed for quality (DISCERN tool), readability (FKGL, SMOG (Simple Measure of Gobbledygook)), Patient Education Materials Assessment Tool (PEMAT, understandability/actionability), and accuracy. RESULTS: ChatGPT-4o (01) and ChatGPT-3.5 generated good-quality PEMs (DISCERN 52.8 and 52.7, respectively); however, quality declined from prompt A to prompt B (p = 0.001 and p = 0.013). Google Gemini produced fair-quality (DISCERN 43) but improved with prompt B (p = 0.02). All PEMs exceeded the 70% PEMAT understandability threshold but failed the 70% actionability threshold (40%). No misinformation was identified. Readability improved with prompt B; ChatGPT-4o (01) and ChatGPT-3.5 achieved a sixth-grade level or below (FGKL 6 ± 0.6 and 6.2 ± 0.3), while Google Gemini did not (FGKL 7 ± 0.6). ChatGPT-4o (01) outperformed Google Gemini in readability (p < 0.001) but was comparable to ChatGPT-3.5 (p = 0.846). Prompt C improved readability across all LLMs, with ChatGPT-4o (o1 Preview) showing the most significant gains (FKGL 5.8 ± 1.5; p < 0.001). CONCLUSIONS: ChatGPT-4o (o1 Preview) demonstrates potential in producing accurate, good-quality, understandable PEMs, and in improving online PEMs on childhood myopia.
Ophthalmol Ther. 2025-6
J Pediatr Ophthalmol Strabismus. 2025-6-27
J Bone Joint Surg Am. 2025-6-19
Am J Ophthalmol. 2024-9
J Neuroophthalmol. 2024-10-10
J Neuroophthalmol. 2024-10-10
Br J Ophthalmol. 2025-5-30
Int J Retina Vitreous. 2024-10-17
Am J Ophthalmol. 2024-10