评估人工智能聊天机器人提供的心血管疾病信息质量：一项比较研究。

Evaluating the Quality of Cardiovascular Disease Information From AI Chatbots: A Comparative Study.

作者信息

Singavarapu Joshua, Khemlani Amber, Jacobs Menachem, Berglas Eli, Lazar Jason, Kabarriti Abdo

机构信息

Cardiology, State University of New York Downstate Health Sciences University, Brooklyn, USA.

Urology, State University of New York Downstate Health Sciences University, Brooklyn, USA.

出版信息

Cureus. 2025 Jul 16;17(7):e88085. doi: 10.7759/cureus.88085. eCollection 2025 Jul.

DOI:10.7759/cureus.88085

PMID:40821349

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12356239/

Abstract

Artificial intelligence (AI) is increasingly being utilized as an informational resource, with chatbots attracting users for their ability to generate instantaneous responses. This study evaluates the understandability, actionability, readability, quality, and misinformation in medical information provided by four prominent chatbots - Bard, ChatGPT 3.5, Claude 2.0, and Perplexity - on three prevalent cardiovascular diseases (CVDs): myocardial infarctions, heart failure, and arrhythmias. These chatbots were used because of their popularity and high usage rates among chatbots. Using Google Trends, the top five U.S. search queries related to heart attack, arrhythmia, and heart failure from September 29, 2018, to September 29, 2023, were identified. The top five queries were chosen in relation to these topics because they accounted for over 80% of the public's searches related to these topics. The chatbot responses were blinded and analyzed by two evaluators using DISCERN for quality, Patient Education Materials Assessment Tool (PEMAT) for understandability and actionability, and Flesch-Kincaid scores for readability. Statistical tests included the Kruskal-Wallis test for DISCERN, the chi-square test for PEMAT, and one-way ANOVA for Flesch-Kincaid scores. Bard generated responses with a statistically lower Flesch-Kincaid reading score than the other chatbots. Bard and ChatGPT 3.5 provided more actionable responses. Among the CVD topics, "heart attack" yielded lower-grade-level responses and more actionable information compared to "arrhythmia" and "heart failure." This study is among the first to assess AI credibility in disseminating cardiovascular information. It highlights how acute pathologic events may prompt more actionable and accessible chatbot responses. As AI continues to evolve, collaboration among healthcare professionals, researchers, and developers is crucial to ensuring the safe and effective use of AI in patient education and public health.

摘要

人工智能（AI）越来越多地被用作一种信息资源，聊天机器人因其能够生成即时回复的能力而吸引用户。本研究评估了四个著名聊天机器人——Bard、ChatGPT 3.5、Claude 2.0和Perplexity——提供的关于三种常见心血管疾病（CVD）：心肌梗死、心力衰竭和心律失常的医学信息的可理解性、可操作性、可读性、质量和错误信息。选择这些聊天机器人是因为它们在聊天机器人中很受欢迎且使用率很高。利用谷歌趋势，确定了2018年9月29日至2023年9月29日期间与心脏病发作、心律失常和心力衰竭相关的美国前五大搜索查询。选择这前五个查询是因为它们占了公众与这些主题相关搜索的80%以上。聊天机器人的回复由两名评估人员进行盲测，使用DISCERN评估质量，使用患者教育材料评估工具（PEMAT）评估可理解性和可操作性，使用弗莱什-金凯德分数评估可读性。统计测试包括用于DISCERN的克鲁斯卡尔-沃利斯检验、用于PEMAT的卡方检验以及用于弗莱什-金凯德分数的单因素方差分析。Bard生成的回复在统计学上的弗莱什-金凯德阅读分数低于其他聊天机器人。Bard和ChatGPT 3.5提供了更具可操作性的回复。在心血管疾病主题中，与“心律失常”和“心力衰竭”相比，“心脏病发作”产生的回复年级水平较低且可操作信息更多。本研究是首批评估人工智能在传播心血管信息方面可信度的研究之一。它强调了急性病理事件如何可能促使聊天机器人做出更具可操作性和可获取性的回复。随着人工智能的不断发展，医疗保健专业人员、研究人员和开发人员之间的合作对于确保人工智能在患者教育和公共卫生中的安全有效使用至关重要。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估人工智能聊天机器人提供的心血管疾病信息质量：一项比较研究。

Evaluating the Quality of Cardiovascular Disease Information From AI Chatbots: A Comparative Study.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

评估人工智能聊天机器人提供的心血管疾病信息质量：一项比较研究。

Evaluating the Quality of Cardiovascular Disease Information From AI Chatbots: A Comparative Study.

作者信息

机构信息

出版信息

相似文献

本文引用的文献