Suppr超能文献

Evaluating the Readability and Quality of Bladder Cancer Information from AI Chatbots: A Comparative Study Between ChatGPT, Google Gemini, Grok, Claude and DeepSeek.

作者信息

Patel Kunjan, Radcliffe Robert

机构信息

Royal Derby Hospital, Derby DE22 3NE, UK.

出版信息

J Clin Med. 2025 Nov 3;14(21):7804. doi: 10.3390/jcm14217804.

Abstract

: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such as ChatGPT 4o, Google Gemini 2.0 flash, Grok 3, Claude Sonnet 3.7 and DeepSeek R1. Their responses were analysed in terms of Readability Indices, and two consultant urologists rated the quality of information provided using the validated DISCERN tool. : The top 10 most frequently asked questions about bladder cancer were identified using Google Trends. These questions were then provided to five different AI chatbots, and their responses were collected. No prompts were used, reflecting natural language queries that patients would use. The responses were analysed in terms of their readability using five validated indices: Flesch Reading Ease (FRE), the Flesch-Kincaid Reading Grade Level (FKRGL), the Gunning Fog Index, the Coleman-Liau Index and the SMOG index. Two consultant urologists then independently assessed the responses of various AI chatbots using the DISCERN tool, which rates the quality of the health information on a five-point LIKERT scale. Inter-rater agreement was calculated using Cohen's Kappa and the intraclass correlation coefficient (ICC). : ChatGPT 4o was the overall winner in readability scores, with the highest Flesch Reading Ease score (59.4) and the lowest average reading grade level (7.0) required to understand the material. Grok 3 was a close second (FRE 58.3, grade level 8.7). Claude 3.7 Sonnet used the most complex language in its answers and therefore scored the lowest FRE score of 44.9, with the highest grade level (9.5) and also the highest complexity on other indices. In the DISCERN analysis, Grok 3 received the highest average score (52.0), followed closely by ChatGPT 4o (50.5). The inter-rater agreement was highest for ChatGPT 4o (ICC: 0.791; Kappa: 0.437), while it was lowest for Grok 3 (ICC: 0.339, Kappa 0.0, Weighted Kappa 0.335). : All AI chatbots can provide generally good-quality answers to questions about bladder cancer with zero hallucinations. ChatGPT 4o was the overall winner, with the best readability metrics, strong DISCERN ratings and highest inter-rater agreement.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/109d/12610445/0d482e285a27/jcm-14-07804-g001.jpg

相似文献

4
AI-generated patient education for ankylosing spondylitis: a comparative study of readability and quality.
Clin Rheumatol. 2026 Mar;45(3):2003-2008. doi: 10.1007/s10067-025-07771-8. Epub 2025 Dec 13.

本文引用的文献

4
ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。
Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.
7
Quality of information and appropriateness of ChatGPT outputs for urology patients.针对泌尿外科患者的ChatGPT输出信息质量及适用性
Prostate Cancer Prostatic Dis. 2024 Mar;27(1):103-108. doi: 10.1038/s41391-023-00705-y. Epub 2023 Jul 29.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验