关于脑肿瘤预后的在线及由ChatGPT生成的患者教育材料未达到可读性标准。

Online and ChatGPT-generated patient education materials regarding brain tumor prognosis fail to meet readability standards.

作者信息

Shukla Ishav Y, Sun Matthew Z

机构信息

Department of Neurological Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA.

出版信息

J Clin Neurosci. 2025 Aug;138:111410. doi: 10.1016/j.jocn.2025.111410. Epub 2025 Jun 20.

DOI:10.1016/j.jocn.2025.111410

PMID:40543265

Abstract

OBJECTIVE

Online healthcare literature often exceeds the general population's literacy level. Our study assesses the readability of online and ChatGPT-generated materials on glioblastomas, meningiomas, and pituitary adenomas, comparing readability by tumor type, institutional affiliation, authorship, and source (websites vs. ChatGPT).

METHODS

This cross-sectional study involved a Google Chrome search (November 2024) using 'prognosis of [tumor type],' with the first 100 English-language, patient-directed results per tumor included. Websites were categorized by tumor, institutional affiliation (university vs. non-affiliated), and authorship (medical-professional reviewed vs. non-reviewed). ChatGPT 4.0 was queried with three standardized questions per tumor, based on the most prevalent content found in patient-facing websites. Five metrics were assessed: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, and SMOG Index. Comparisons were conducted using Mann-Whitney U tests and t-tests.

RESULTS

Zero websites and ChatGPT responses met the readability benchmarks of 6th grade or below (AMA guideline) or 8th grade or below (NIH guideline). Of the websites, 50.4 % were at a 9th-12th grade level, 47.9 % at an undergraduate level, and 1.7 % at a graduate level. Websites reviewed by medical professionals had higher FRE (p = 0.03) and lower CLI (p = 0.009) compared to non-reviewed websites. Among ChatGPT responses, 93.3 % were graduate level. ChatGPT responses had lower readability than websites across all metrics (p < 0.001).

CONCLUSION

Online and ChatGPT-generated neuro-oncology materials exceed recommended readability standards, potentially hindering patients' ability to make informed decisions. Future efforts should focus on standardizing readability guidelines, refining AI-generated content, incorporating professional oversight consistently, and improving the accessibility of online neuro-oncology materials.

摘要

目的

在线医疗文献的内容往往超出普通人群的读写能力水平。我们的研究评估了关于胶质母细胞瘤、脑膜瘤和垂体腺瘤的在线及ChatGPT生成材料的可读性，并按肿瘤类型、机构隶属关系、作者身份和来源（网站与ChatGPT）对可读性进行比较。

方法

这项横断面研究在2024年11月使用谷歌浏览器进行搜索，搜索词为“[肿瘤类型]的预后”，每种肿瘤纳入前100条面向患者的英文搜索结果。网站按肿瘤、机构隶属关系（大学与非隶属）和作者身份（经医学专业人员审核与未经审核）进行分类。根据面向患者的网站中最常见的内容，针对每种肿瘤向ChatGPT 4.0提出三个标准化问题。评估了五个指标：弗莱什易读性指数、弗莱什-金凯德年级水平、冈宁雾度指数、科尔曼-廖指数和烟雾指数。使用曼-惠特尼U检验和t检验进行比较。

结果

没有网站和ChatGPT回复达到六年级及以下（美国医学协会指南）或八年级及以下（美国国立卫生研究院指南）的可读性基准。在网站中，50.4%处于九年级至十二年级水平，47.9%处于本科水平，1.7%处于研究生水平。与未经审核的网站相比，经医学专业人员审核的网站具有更高的弗莱什易读性指数（p = 0.03）和更低的科尔曼-廖指数（p = 0.009）。在ChatGPT回复中，93.3%处于研究生水平。ChatGPT回复在所有指标上的可读性均低于网站（p < 0.001）。