Nelson Houston C, Beauchamp Morgan T, Pace April A
Dermatology, Eastern Virginia Medical School, Macon and Joan Brock Virginia Health Sciences at Old Dominion University, Norfolk, USA.
Public Health, Eastern Virginia Medical School, Macon and Joan Brock Virginia Health Sciences at Old Dominion University, Norfolk, USA.
Cureus. 2025 Jun 22;17(6):e86543. doi: 10.7759/cureus.86543. eCollection 2025 Jun.
The internet has become a primary source of health information for the public, with important implications for patient decision-making and public health outcomes. However, the quality and readability of this content vary widely. With the rise of generative artificial intelligence (AI) tools such as ChatGPT and Gemini, new challenges and opportunities have emerged in how patients access and interpret medical information.
To evaluate and compare the quality, credibility, and readability of consumer health information provided by traditional search engines (Google, Bing) and generative AI platforms (ChatGPT, Gemini) using three validated instruments: DISCERN, JAMA Benchmark Criteria, and Flesch-Kincaid Readability Metrics. Methods: Twenty health-related webpages from each platform were collected using a standardized query across Google, Bing, Gemini, and ChatGPT. Each source was assessed independently by two reviewers using the DISCERN instrument and the adapted JAMA benchmark criteria. Readability was evaluated using the Flesch Reading Ease and Grade Level scores. One-way ANOVA with Bonferroni correction was used to compare platform performance, and Cohen's Kappa measured inter-rater reliability. Results: Google achieved the highest mean scores for both quality and credibility (DISCERN: 3.33 ± 0.53; JAMA: 3.70 ± 0.44), followed by Bing, Gemini, and ChatGPT. ChatGPT received the lowest scores across all quality measures. Readability analysis revealed no statistically significant differences between platforms; however, all content exceeded recommended reading levels for public health information. Cohen's Kappa indicated strong inter-rater agreement across DISCERN items. Conclusion: Google remains the most reliable source of high-quality, readable health information among the evaluated platforms. Generative AI tools such as ChatGPT and Gemini, while increasingly popular, exhibited significant limitations in accuracy, transparency, and complexity. These findings highlight the need for improved oversight, transparency, and user education regarding AI-generated health content.
互联网已成为公众获取健康信息的主要来源,对患者决策和公共卫生结果具有重要影响。然而,这些内容的质量和可读性差异很大。随着ChatGPT和Gemini等生成式人工智能(AI)工具的兴起,患者获取和解读医疗信息的方式出现了新的挑战和机遇。
使用三种经过验证的工具:DISCERN、《美国医学会杂志》基准标准和弗莱什-金凯德可读性指标,评估和比较传统搜索引擎(谷歌、必应)和生成式AI平台(ChatGPT、Gemini)提供的消费者健康信息的质量、可信度和可读性。
通过在谷歌、必应、Gemini和ChatGPT上使用标准化查询,从每个平台收集20个与健康相关的网页。两名评审员使用DISCERN工具和改编后的《美国医学会杂志》基准标准对每个来源进行独立评估。使用弗莱什阅读简易度和年级水平分数评估可读性。采用带有Bonferroni校正的单因素方差分析来比较平台性能,并用科恩kappa系数衡量评分者间的可靠性。
谷歌在质量和可信度方面均获得最高平均分(DISCERN:3.33±0.53;《美国医学会杂志》:3.70±0.44),其次是必应、Gemini和ChatGPT。ChatGPT在所有质量指标上得分最低。可读性分析显示各平台之间无统计学显著差异;然而,所有内容均超过了公共卫生信息的推荐阅读水平。科恩kappa系数表明,DISCERN各项之间评分者间一致性很强。
在所评估的平台中,谷歌仍然是高质量、易读的健康信息最可靠的来源。ChatGPT和Gemini等生成式AI工具虽然越来越受欢迎,但在准确性、透明度和复杂性方面存在显著局限性。这些发现凸显了改善对人工智能生成的健康内容的监督、透明度和用户教育的必要性。