可靠性差距：传统搜索引擎在酒渣鼻公共卫生信息质量方面如何优于人工智能（AI）聊天机器人。

The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality.

作者信息

Nelson Houston C, Beauchamp Morgan T, Pace April A

机构信息

Dermatology, Eastern Virginia Medical School, Macon and Joan Brock Virginia Health Sciences at Old Dominion University, Norfolk, USA.

Public Health, Eastern Virginia Medical School, Macon and Joan Brock Virginia Health Sciences at Old Dominion University, Norfolk, USA.

出版信息

Cureus. 2025 Jun 22;17(6):e86543. doi: 10.7759/cureus.86543. eCollection 2025 Jun.

DOI:10.7759/cureus.86543

PMID:40698243

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12282678/

Abstract

BACKGROUND

The internet has become a primary source of health information for the public, with important implications for patient decision-making and public health outcomes. However, the quality and readability of this content vary widely. With the rise of generative artificial intelligence (AI) tools such as ChatGPT and Gemini, new challenges and opportunities have emerged in how patients access and interpret medical information.

OBJECTIVE

To evaluate and compare the quality, credibility, and readability of consumer health information provided by traditional search engines (Google, Bing) and generative AI platforms (ChatGPT, Gemini) using three validated instruments: DISCERN, JAMA Benchmark Criteria, and Flesch-Kincaid Readability Metrics. Methods: Twenty health-related webpages from each platform were collected using a standardized query across Google, Bing, Gemini, and ChatGPT. Each source was assessed independently by two reviewers using the DISCERN instrument and the adapted JAMA benchmark criteria. Readability was evaluated using the Flesch Reading Ease and Grade Level scores. One-way ANOVA with Bonferroni correction was used to compare platform performance, and Cohen's Kappa measured inter-rater reliability. Results: Google achieved the highest mean scores for both quality and credibility (DISCERN: 3.33 ± 0.53; JAMA: 3.70 ± 0.44), followed by Bing, Gemini, and ChatGPT. ChatGPT received the lowest scores across all quality measures. Readability analysis revealed no statistically significant differences between platforms; however, all content exceeded recommended reading levels for public health information. Cohen's Kappa indicated strong inter-rater agreement across DISCERN items. Conclusion: Google remains the most reliable source of high-quality, readable health information among the evaluated platforms. Generative AI tools such as ChatGPT and Gemini, while increasingly popular, exhibited significant limitations in accuracy, transparency, and complexity. These findings highlight the need for improved oversight, transparency, and user education regarding AI-generated health content.

摘要

背景

互联网已成为公众获取健康信息的主要来源，对患者决策和公共卫生结果具有重要影响。然而，这些内容的质量和可读性差异很大。随着ChatGPT和Gemini等生成式人工智能（AI）工具的兴起，患者获取和解读医疗信息的方式出现了新的挑战和机遇。

目的

使用三种经过验证的工具：DISCERN、《美国医学会杂志》基准标准和弗莱什-金凯德可读性指标，评估和比较传统搜索引擎（谷歌、必应）和生成式AI平台（ChatGPT、Gemini）提供的消费者健康信息的质量、可信度和可读性。

方法

通过在谷歌、必应、Gemini和ChatGPT上使用标准化查询，从每个平台收集20个与健康相关的网页。两名评审员使用DISCERN工具和改编后的《美国医学会杂志》基准标准对每个来源进行独立评估。使用弗莱什阅读简易度和年级水平分数评估可读性。采用带有Bonferroni校正的单因素方差分析来比较平台性能，并用科恩kappa系数衡量评分者间的可靠性。

结果

谷歌在质量和可信度方面均获得最高平均分（DISCERN：3.33±0.53；《美国医学会杂志》：3.70±0.44），其次是必应、Gemini和ChatGPT。ChatGPT在所有质量指标上得分最低。可读性分析显示各平台之间无统计学显著差异；然而，所有内容均超过了公共卫生信息的推荐阅读水平。科恩kappa系数表明，DISCERN各项之间评分者间一致性很强。

结论

在所评估的平台中，谷歌仍然是高质量、易读的健康信息最可靠的来源。ChatGPT和Gemini等生成式AI工具虽然越来越受欢迎，但在准确性、透明度和复杂性方面存在显著局限性。这些发现凸显了改善对人工智能生成的健康内容的监督、透明度和用户教育的必要性。

相似文献

The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality.可靠性差距：传统搜索引擎在酒渣鼻公共卫生信息质量方面如何优于人工智能（AI）聊天机器人。

Cureus. 2025 Jun 22;17(6):e86543. doi: 10.7759/cureus.86543. eCollection 2025 Jun.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Evaluating AI chatbots in penis enhancement information: a comparative analysis of readability, reliability and quality.评估人工智能聊天机器人在阴茎增大信息方面的表现：可读性、可靠性和质量的比较分析。

Int J Impot Res. 2025 Jun 3. doi: 10.1038/s41443-025-01098-3.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。

BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.

Assessment of information quality in contemporary artificial intelligence systems for digital smile design: A comparative analysis.当代用于数字化微笑设计的人工智能系统中的信息质量评估：一项比较分析。

J Prosthet Dent. 2025 Jul 16. doi: 10.1016/j.prosdent.2025.06.030.

Readability of AI-Generated Patient Information Leaflets on Alzheimer's, Vascular Dementia, and Delirium.关于阿尔茨海默病、血管性痴呆和谵妄的人工智能生成的患者信息手册的可读性。

Cureus. 2025 Jun 6;17(6):e85463. doi: 10.7759/cureus.85463. eCollection 2025 Jun.

Comparison of Responses from ChatGPT-4, Google Gemini, and Google Search to Common Patient Questions About Ankle Sprains: A Readability Analysis.ChatGPT-4、谷歌Gemini和谷歌搜索对关于脚踝扭伤的常见患者问题的回答比较：可读性分析

J Am Acad Orthop Surg. 2025 Jul 3;33(16):924-930. doi: 10.5435/JAAOS-D-25-00260.

本文引用的文献

Sports Medicine Patient-Reported Outcomes Fail to Meet National Institutes of Health- and American Medical Association-Recommended Reading Levels.运动医学患者报告的结果未达到美国国立卫生研究院和美国医学协会推荐的阅读水平。

Arthroscopy. 2025 Mar 7. doi: 10.1016/j.arthro.2025.02.029.

Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.人工智能可以生成虚假但看起来真实的科学医学文章：潘多拉的盒子已经被打开。

J Med Internet Res. 2023 May 31;25:e46924. doi: 10.2196/46924.

Patient decision support interventions for candidates considering elective surgeries: a systematic review and meta-analysis.考虑择期手术的患者的决策支持干预措施：系统评价和荟萃分析。

Int J Surg. 2023 May 1;109(5):1382-1399. doi: 10.1097/JS9.0000000000000302.

Predicting intention to receive COVID-19 vaccine among the general population using the health belief model and the theory of planned behavior model.运用健康信念模型和计划行为理论模型预测普通人群接种新冠疫苗的意愿。

BMC Public Health. 2021 Apr 26;21(1):804. doi: 10.1186/s12889-021-10816-7.

Considering Emotion in COVID-19 Vaccine Communication: Addressing Vaccine Hesitancy and Fostering Vaccine Confidence.考虑在新冠疫苗沟通中融入情感：解决疫苗犹豫问题，增强疫苗信心。

Health Commun. 2020 Dec;35(14):1718-1722. doi: 10.1080/10410236.2020.1838096. Epub 2020 Oct 30.

Strengthening vaccine confidence during the COVID-19 pandemic: A new opportunity for global hepatitis B virus elimination.在新冠疫情期间增强疫苗信心：全球消除乙型肝炎病毒的新契机。

J Hepatol. 2020 Sep;73(3):490-492. doi: 10.1016/j.jhep.2020.06.008. Epub 2020 Jul 27.

DISCERN: an instrument for judging the quality of written consumer health information on treatment choices.DISCERN：一种用于评判关于治疗选择的书面消费者健康信息质量的工具。

J Epidemiol Community Health. 1999 Feb;53(2):105-11. doi: 10.1136/jech.53.2.105.

Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor--Let the reader and viewer beware.评估、控制并确保互联网上医学信息的质量：读者与观者需谨慎——让读者和观者小心。

JAMA. 1997 Apr 16;277(15):1244-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验