Türe Nurullah, Tahir Emel, Enver Necati
Department of Otorhinolaryngology, Kütahya Health Sciences University, Kütahya, Türkiye.
Department of Otorhinolaryngology, Ondokuz Mayıs University, Samsun, Türkiye.
Eur Arch Otorhinolaryngol. 2025 Aug 13. doi: 10.1007/s00405-025-09628-x.
This study aims to evaluate online patient education materials on retrograde cricopharyngeal dysfunction (RCPD) by comparing the readability, understandability, and quality of content generated by large language models (LLM).
A web search in December 2024 evaluated 51 online resources and four LLMs (ChatGPT 4.0, Gemini 1.5 Flash, Perplexity GPT-3.5, DeepSeek-V2.5). Readability was analyzed using Readable.io, understandability actionability was assessed using PEMAT, and information quality was assessed using DISCERN.
The average readability level of the online material and the LLM responses was at the 11th-12th grade level. The Flesch Reading Ease score was lowest for the LLMs, especially for the DeepSeek-V2.5 model (24.21). While PEMAT understandability scores were adequate for online (82%) and LLMs (79%), actionability was low across all groups (25-37%). DISCERN analyses showed that both sources of information were of limited quality in supporting treatment decisions.
This study revealed that both online and LLM-generated materials on RCPD exceeded the recommended readability levels. Although the materials demonstrated acceptable understandability, they exhibited low actionability and inadequate overall quality, emphasizing the need for more patient-centered digital health communication.
本研究旨在通过比较大语言模型(LLM)生成的关于环咽肌功能障碍(RCPD)的在线患者教育材料的可读性、可理解性和内容质量来进行评估。
2024年12月进行的一项网络搜索评估了51个在线资源和四个大语言模型(ChatGPT 4.0、Gemini 1.5 Flash、Perplexity GPT - 3.5、DeepSeek - V2.5)。使用Readable.io分析可读性,使用PEMAT评估可理解性和可操作性,使用DISCERN评估信息质量。
在线材料和大语言模型回复的平均可读性水平为11 - 12年级水平。大语言模型的弗莱施易读性得分最低,尤其是DeepSeek - V2.5模型(24.21)。虽然PEMAT可理解性得分对于在线材料(82%)和大语言模型(79%)来说是足够的,但所有组的可操作性都较低(25 - 37%)。DISCERN分析表明,这两种信息来源在支持治疗决策方面质量有限。
本研究表明,关于RCPD的在线材料和大语言模型生成的材料都超过了推荐的可读性水平。虽然这些材料表现出可接受的可理解性,但它们的可操作性较低且整体质量不足,强调了需要更多以患者为中心的数字健康沟通。