Shukla Ishav Y, Sun Matthew Z
Department of Neurological Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Department of Neurological Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA.
J Clin Neurosci. 2025 Aug;138:111410. doi: 10.1016/j.jocn.2025.111410. Epub 2025 Jun 20.
Online healthcare literature often exceeds the general population's literacy level. Our study assesses the readability of online and ChatGPT-generated materials on glioblastomas, meningiomas, and pituitary adenomas, comparing readability by tumor type, institutional affiliation, authorship, and source (websites vs. ChatGPT).
This cross-sectional study involved a Google Chrome search (November 2024) using 'prognosis of [tumor type],' with the first 100 English-language, patient-directed results per tumor included. Websites were categorized by tumor, institutional affiliation (university vs. non-affiliated), and authorship (medical-professional reviewed vs. non-reviewed). ChatGPT 4.0 was queried with three standardized questions per tumor, based on the most prevalent content found in patient-facing websites. Five metrics were assessed: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, and SMOG Index. Comparisons were conducted using Mann-Whitney U tests and t-tests.
Zero websites and ChatGPT responses met the readability benchmarks of 6th grade or below (AMA guideline) or 8th grade or below (NIH guideline). Of the websites, 50.4 % were at a 9th-12th grade level, 47.9 % at an undergraduate level, and 1.7 % at a graduate level. Websites reviewed by medical professionals had higher FRE (p = 0.03) and lower CLI (p = 0.009) compared to non-reviewed websites. Among ChatGPT responses, 93.3 % were graduate level. ChatGPT responses had lower readability than websites across all metrics (p < 0.001).
Online and ChatGPT-generated neuro-oncology materials exceed recommended readability standards, potentially hindering patients' ability to make informed decisions. Future efforts should focus on standardizing readability guidelines, refining AI-generated content, incorporating professional oversight consistently, and improving the accessibility of online neuro-oncology materials.
在线医疗文献的内容往往超出普通人群的读写能力水平。我们的研究评估了关于胶质母细胞瘤、脑膜瘤和垂体腺瘤的在线及ChatGPT生成材料的可读性,并按肿瘤类型、机构隶属关系、作者身份和来源(网站与ChatGPT)对可读性进行比较。
这项横断面研究在2024年11月使用谷歌浏览器进行搜索,搜索词为“[肿瘤类型]的预后”,每种肿瘤纳入前100条面向患者的英文搜索结果。网站按肿瘤、机构隶属关系(大学与非隶属)和作者身份(经医学专业人员审核与未经审核)进行分类。根据面向患者的网站中最常见的内容,针对每种肿瘤向ChatGPT 4.0提出三个标准化问题。评估了五个指标:弗莱什易读性指数、弗莱什-金凯德年级水平、冈宁雾度指数、科尔曼-廖指数和烟雾指数。使用曼-惠特尼U检验和t检验进行比较。
没有网站和ChatGPT回复达到六年级及以下(美国医学协会指南)或八年级及以下(美国国立卫生研究院指南)的可读性基准。在网站中,50.4%处于九年级至十二年级水平,47.9%处于本科水平,1.7%处于研究生水平。与未经审核的网站相比,经医学专业人员审核的网站具有更高的弗莱什易读性指数(p = 0.03)和更低的科尔曼-廖指数(p = 0.009)。在ChatGPT回复中,93.3%处于研究生水平。ChatGPT回复在所有指标上的可读性均低于网站(p < 0.001)。
在线及ChatGPT生成的神经肿瘤学材料超出了推荐的可读性标准,可能会阻碍患者做出明智决策。未来的工作应侧重于规范可读性指南、完善人工智能生成的内容、持续纳入专业监督以及提高在线神经肿瘤学材料的可及性。