Alamleh Salahaldin, Mavedatnia Dorsa, Francis Gizelle, Le Trung, Davies Joel, Lin Vincent, Lee John J W
Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
Department of Otolaryngology-Head and Neck Surgery, University of Toronto, Toronto, ON, Canada.
J Otolaryngol Head Neck Surg. 2025 Jan-Dec;54:19160216251360651. doi: 10.1177/19160216251360651. Epub 2025 Aug 8.
ImportanceOnline patient education materials (PEMs) and large language model (LLM) outputs can provide critical health information for patients, yet their readability, quality, and reliability remain unclear for Meniere's disease.ObjectiveTo assess the readability, quality, and reliability of online PEMs and LLM-generated outputs on Meniere's disease.DesignCross-sectional study.SettingPEMs were identified from the first 40 Google Search results based on inclusion criteria. LLM outputs were extracted from unique interactions with ChatGPT and Google Gemini.ParticipantsThirty-one PEMs met inclusion criteria. LLM outputs were obtained from 3 unique interactions each with ChatGPT and Google Gemini.InterventionReadability was assessed using 5 validated formulas [Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), Gunning-Fog Index, Coleman-Liau Index, and Simple Measure of Gobbledygook Index]. Quality and reliability were assessed by 2 independent raters using the DISCERN tool.Main Outcome MeasuresReadability was assessed for adherence to the American Medical Association's (AMA) sixth-grade reading level guideline. Source reliability, as well as the completeness, accuracy, and clarity of treatment-related information, was evaluated using the DISCERN tool.ResultsThe most common PEM source type was academic institutions (32.2%), while the majority of PEMs (61.3%) originated from the United States. The mean FRE score for PEMs corresponded to a 10th- to 12th-grade reading level, whereas ChatGPT and Google Gemini outputs were classified at post-graduate and college reading levels, respectively. Only 16.1% of PEMs met the AMA's sixth-grade readability recommendation using the FKGL readability index, and no LLM outputs achieved this standard. Overall DISCERN scores categorized PEMs and ChatGPT outputs as "poor quality," while Google Gemini outputs were rated "fair quality." No significant differences were found in readability or DISCERN scores across PEM source types. Additionally, no significant correlation was identified between PEM readability, quality, and reliability scores.ConclusionsOnline PEMs and LLM-generated outputs on Meniere's disease do not meet AMA readability standards and are generally of poor quality and reliability.RelevanceFuture PEMs should prioritize improved readability while maintaining high-quality, reliable information to better support patient decision-making for patients with Meniere's disease.
重要性 在线患者教育材料(PEMs)和大语言模型(LLM)的输出可为患者提供关键的健康信息,但其针对梅尼埃病的可读性、质量和可靠性仍不明确。 目的 评估在线PEMs和LLM生成的关于梅尼埃病的输出内容的可读性、质量和可靠性。 设计 横断面研究。 设置 根据纳入标准,从谷歌搜索结果的前40条中识别PEMs。从与ChatGPT和谷歌Gemini的独特交互中提取LLM输出。 参与者 31份PEMs符合纳入标准。分别从与ChatGPT和谷歌Gemini的3次独特交互中获取LLM输出。 干预 使用5种经过验证的公式[弗莱什易读性(FRE)、弗莱什-金凯德年级水平(FKGL)、冈宁-福格指数、科尔曼-廖指数和简单费解度指数]评估可读性。由2名独立评估者使用DISCERN工具评估质量和可靠性。 主要结局指标 评估可读性是否符合美国医学协会(AMA)六年级阅读水平指南。使用DISCERN工具评估来源可靠性以及治疗相关信息的完整性、准确性和清晰度。 结果 最常见的PEM来源类型是学术机构(32.2%),而大多数PEMs(61.3%)来自美国。PEMs的平均FRE分数对应于十至十二年级的阅读水平,而ChatGPT和谷歌Gemini的输出分别归类为研究生和大学阅读水平。使用FKGL可读性指数时,只有16.1%的PEMs符合AMA六年级可读性建议,且没有LLM输出达到该标准。总体DISCERN分数将PEMs和ChatGPT输出归类为“质量差”, 而谷歌Gemini输出被评为“质量一般”。不同PEM来源类型的可读性或DISCERN分数未发现显著差异。此外,PEM可读性、质量和可靠性分数之间未发现显著相关性。 结论 关于梅尼埃病的在线PEMs和LLM生成的输出不符合AMA可读性标准,且质量和可靠性普遍较差。 相关性 未来的PEMs应在保持高质量、可靠信息的同时,优先提高可读性,以更好地支持梅尼埃病患者的决策。
Ann Otol Rhinol Laryngol. 2025-9
Neurosurgery. 2025-7-24
J Bone Joint Surg Am. 2025-6-19
Commun Med (Lond). 2025-1-21
J Biomed Inform. 2024-3
Otolaryngol Head Neck Surg. 2024-6
Prostate Cancer Prostatic Dis. 2024-3
Nat Med. 2023-8
World J Otorhinolaryngol Head Neck Surg. 2022-4-26