用于环咽肌功能障碍的在线教育材料和大语言模型的可读性、可理解性及质量

Readability, understandability, and quality of online education materials and large language models for retrograde cricopharyngeal muscle dysfunction.

作者信息

Türe Nurullah, Tahir Emel, Enver Necati

机构信息

Department of Otorhinolaryngology, Kütahya Health Sciences University, Kütahya, Türkiye.

Department of Otorhinolaryngology, Ondokuz Mayıs University, Samsun, Türkiye.

出版信息

Eur Arch Otorhinolaryngol. 2025 Aug 13. doi: 10.1007/s00405-025-09628-x.

DOI:10.1007/s00405-025-09628-x

PMID:40802099

Abstract

OBJECTIVE

This study aims to evaluate online patient education materials on retrograde cricopharyngeal dysfunction (RCPD) by comparing the readability, understandability, and quality of content generated by large language models (LLM).

METHOD

A web search in December 2024 evaluated 51 online resources and four LLMs (ChatGPT 4.0, Gemini 1.5 Flash, Perplexity GPT-3.5, DeepSeek-V2.5). Readability was analyzed using Readable.io, understandability actionability was assessed using PEMAT, and information quality was assessed using DISCERN.

RESULTS

The average readability level of the online material and the LLM responses was at the 11th-12th grade level. The Flesch Reading Ease score was lowest for the LLMs, especially for the DeepSeek-V2.5 model (24.21). While PEMAT understandability scores were adequate for online (82%) and LLMs (79%), actionability was low across all groups (25-37%). DISCERN analyses showed that both sources of information were of limited quality in supporting treatment decisions.

CONCLUSION

This study revealed that both online and LLM-generated materials on RCPD exceeded the recommended readability levels. Although the materials demonstrated acceptable understandability, they exhibited low actionability and inadequate overall quality, emphasizing the need for more patient-centered digital health communication.

摘要

目的

本研究旨在通过比较大语言模型（LLM）生成的关于环咽肌功能障碍（RCPD）的在线患者教育材料的可读性、可理解性和内容质量来进行评估。

方法

2024年12月进行的一项网络搜索评估了51个在线资源和四个大语言模型（ChatGPT 4.0、Gemini 1.5 Flash、Perplexity GPT - 3.5、DeepSeek - V2.5）。使用Readable.io分析可读性，使用PEMAT评估可理解性和可操作性，使用DISCERN评估信息质量。

结果

在线材料和大语言模型回复的平均可读性水平为11 - 12年级水平。大语言模型的弗莱施易读性得分最低，尤其是DeepSeek - V2.5模型（24.21）。虽然PEMAT可理解性得分对于在线材料（82%）和大语言模型（79%）来说是足够的，但所有组的可操作性都较低（25 - 37%）。DISCERN分析表明，这两种信息来源在支持治疗决策方面质量有限。