Suppr超能文献

评估人工智能聊天机器人提供的结节病相关信息的可靠性和质量。

Evaluating the Reliability and Quality of Sarcoidosis-Related Information Provided by AI Chatbots.

作者信息

Yetkin Nur Aleyna, Baran Burcu, Rabahoğlu Bilal, Tutar Nuri, Gülmez İnci

机构信息

Department of Pulmonology, Faculty of Medicine, Erciyes University, 38039 Kayseri, Türkiye.

出版信息

Healthcare (Basel). 2025 Jun 5;13(11):1344. doi: 10.3390/healthcare13111344.

Abstract

Artificial intelligence (AI) chatbots are increasingly employed for the dissemination of health information; however, apprehensions regarding their accuracy and reliability remain. The intricacy of sarcoidosis may lead to misinformation and omissions that affect patient comprehension. This study assessed the usability of AI-generated information on sarcoidosis by evaluating the quality, reliability, readability, understandability, and actionability of chatbot responses to patient-centered queries. : This cross-sectional evaluation included 11 AI chatbots comprising both general-purpose and retrieval-augmented tools. Four sarcoidosis-related queries derived from Google Trends were submitted to each chatbot under standardized conditions. Responses were independently evaluated by four blinded pulmonology experts using DISCERN, the Patient Education Materials Assessment Tool-Printable (PEMAT-P), and Flesch-Kincaid readability metrics. A Web Resource Rating (WRR) score was also calculated. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). : Retrieval-augmented models such as ChatGPT-4o Deep Research, Perplexity Research, and Grok3 Deep Search outperformed general-purpose chatbots across the DISCERN, PEMAT-P, and WRR metrics. However, these high-performing models also produced text at significantly higher reading levels (Flesch-Kincaid Grade Level > 16), reducing accessibility. Actionability scores were consistently lower than understandability scores across all models. The ICCs exceeded 0.80 for all evaluation domains, indicating excellent inter-rater reliability. : Although some AI chatbots can generate accurate and well-structured responses to sarcoidosis-related questions, their limited readability and low actionability present barriers for effective patient education. Optimization strategies, such as prompt refinement, health literacy adaptation, and domain-specific model development, are required to improve the utility of AI chatbots in complex disease communication.

摘要

人工智能(AI)聊天机器人越来越多地被用于传播健康信息;然而,人们对其准确性和可靠性仍存担忧。结节病的复杂性可能导致错误信息和遗漏,从而影响患者的理解。本研究通过评估聊天机器人对以患者为中心的问题的回答的质量、可靠性、可读性、易懂性和可操作性,来评估人工智能生成的关于结节病信息的可用性。:这项横断面评估包括11个人工智能聊天机器人,涵盖通用工具和检索增强工具。在标准化条件下,向每个聊天机器人提交了4个源自谷歌趋势的与结节病相关的问题。由4位不知情的肺病专家使用DISCERN、患者教育材料评估工具-可打印版(PEMAT-P)和弗莱什-金凯德可读性指标对回答进行独立评估。还计算了网络资源评级(WRR)分数。使用组内相关系数(ICC)评估评分者间的可靠性。:在DISCERN、PEMAT-P和WRR指标方面,ChatGPT-4o深度研究、Perplexity研究和Grok3深度搜索等检索增强模型的表现优于通用聊天机器人。然而,这些高性能模型生成的文本阅读水平也明显更高(弗莱什-金凯德年级水平>16),降低了可及性。在所有模型中,可操作性得分始终低于易懂性得分。所有评估领域的ICC均超过0.80,表明评分者间可靠性极佳。:尽管一些人工智能聊天机器人能够对与结节病相关的问题给出准确且结构良好的回答,但其有限的可读性和低可操作性为有效的患者教育带来了障碍。需要优化策略,如提示细化、健康素养调整和特定领域模型开发,以提高人工智能聊天机器人在复杂疾病沟通中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8758/12154112/fc727e9955a3/healthcare-13-01344-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验