Suppr超能文献

大语言模型:开创儿童近视教育新前沿

Large Language Models: Pioneering New Educational Frontiers in Childhood Myopia.

作者信息

Delsoz Mohammad, Hassan Amr, Nabavi Amin, Rahdar Amir, Fowler Brian, Kerr Natalie C, Ditta Lauren Claire, Hoehn Mary E, DeAngelis Margaret M, Grzybowski Andrzej, Tham Yih-Chung, Yousefi Siamak

机构信息

Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, 930 Madison Ave., Suite 471, Memphis, TN, 38163, USA.

Department of Ophthalmology, Gavin Herbert Eye Institute, University of California, Irvine, CA, USA.

出版信息

Ophthalmol Ther. 2025 Jun;14(6):1281-1295. doi: 10.1007/s40123-025-01142-x. Epub 2025 Apr 21.

Abstract

INTRODUCTION

This study aimed to evaluate the performance of three large language models (LLMs), namely ChatGPT-3.5, ChatGPT-4o (o1 Preview), and Google Gemini, in producing patient education materials (PEMs) and improving the readability of online PEMs on childhood myopia.

METHODS

LLM-generated responses were assessed using three prompts. Prompt A requested to "Write educational material on childhood myopia." Prompt B added a modifier specifying "a sixth-grade reading level using the FKGL (Flesch-Kincaid Grade Level) readability formula." Prompt C aimed to rewrite existing PEMs to a sixth-grade level using FKGL. Reponses were assessed for quality (DISCERN tool), readability (FKGL, SMOG (Simple Measure of Gobbledygook)), Patient Education Materials Assessment Tool (PEMAT, understandability/actionability), and accuracy.

RESULTS

ChatGPT-4o (01) and ChatGPT-3.5 generated good-quality PEMs (DISCERN 52.8 and 52.7, respectively); however, quality declined from prompt A to prompt B (p = 0.001 and p = 0.013). Google Gemini produced fair-quality (DISCERN 43) but improved with prompt B (p = 0.02). All PEMs exceeded the 70% PEMAT understandability threshold but failed the 70% actionability threshold (40%). No misinformation was identified. Readability improved with prompt B; ChatGPT-4o (01) and ChatGPT-3.5 achieved a sixth-grade level or below (FGKL 6 ± 0.6 and 6.2 ± 0.3), while Google Gemini did not (FGKL 7 ± 0.6). ChatGPT-4o (01) outperformed Google Gemini in readability (p < 0.001) but was comparable to ChatGPT-3.5 (p = 0.846). Prompt C improved readability across all LLMs, with ChatGPT-4o (o1 Preview) showing the most significant gains (FKGL 5.8 ± 1.5; p < 0.001).

CONCLUSIONS

ChatGPT-4o (o1 Preview) demonstrates potential in producing accurate, good-quality, understandable PEMs, and in improving online PEMs on childhood myopia.

摘要

引言

本研究旨在评估三种大型语言模型(LLMs),即ChatGPT-3.5、ChatGPT-4o(o1预览版)和谷歌Gemini,在生成患者教育材料(PEMs)以及提高儿童近视在线PEMs可读性方面的表现。

方法

使用三个提示对大型语言模型生成的回复进行评估。提示A要求“撰写关于儿童近视的教育材料”。提示B添加了一个修饰语,指定“使用弗莱施-金凯德年级水平(FKGL)可读性公式达到六年级阅读水平”。提示C旨在使用FKGL将现有的PEMs改写为六年级水平。对回复进行质量(DISCERN工具)、可读性(FKGL、烟雾指数(SMOG,即简单的晦涩难懂度量))、患者教育材料评估工具(PEMAT,可理解性/可操作性)和准确性评估。

结果

ChatGPT-4o(01)和ChatGPT-3.5生成了高质量的PEMs(DISCERN分别为52.8和52.7);然而,从提示A到提示B质量有所下降(p = 0.001和p = 0.013)。谷歌Gemini生成的质量一般(DISCERN为43),但在提示B下有所改善(p = 0.02)。所有PEMs都超过了70%的PEMAT可理解性阈值,但未达到70%的可操作性阈值(40%)。未发现错误信息。提示B提高了可读性;ChatGPT-4o(01)和ChatGPT-3.5达到了六年级或以下水平(FGKL为6±0.6和6.2±0.3),而谷歌Gemini未达到(FGKL为7±0.6)。ChatGPT-4o(01)在可读性方面优于谷歌Gemini(p <

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680a/12069199/0aec89379ff6/40123_2025_1142_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验