Behers Benjamin J, Stephenson-Moe Christoph A, Gibons Rebecca M, Vargas Ian A, Wojtas Caroline N, Rosario Manuel A, Anneaud Djhemson, Nord Profilia, Hamad Karen M, Baker Joel F
Department of Internal Medicine, Sarasota Memorial Hospital, Sarasota, USA.
Department of Clinical Sciences, Florida State University College of Medicine, Tallahassee, USA.
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
Background Health literacy empowers patients to participate in their own healthcare. Personal health literacy is one's ability to find, understand, and use information/resources to make well-informed health decisions. Artificial intelligence (AI) has become a source for the acquisition of health-related information through large language model (LLM)-driven chatbots. Assessment of the readability and quality of health information produced by these chatbots has been the subject of numerous studies to date. This study seeks to assess the quality of patient education materials on cardiac catheterization produced by AI chatbots. Methodology We asked a set of 10 questions about cardiac catheterization to four chatbots: ChatGPT (OpenAI, San Francisco, CA), Microsoft Copilot (Microsoft Corporation, Redmond, WA), Google Gemini (Google DeepMind, London, UK), and Meta AI (Meta, New York, NY). The questions and subsequent answers were utilized to make patient education materials on cardiac catheterization. The quality of these materials was assessed using two validated instruments for patient education materials: DISCERN and the Patient Education Materials Assessment Tool (PEMAT). Results The overall DISCERN scores were 4.5 for ChatGPT, 4.4 for Microsoft Copilot and Google Gemini, and 3.8 for Meta AI. ChatGPT, Microsoft Copilot, and Google Gemini tied for the highest reliability score at 4.6, while Meta AI had the lowest with 4.2. ChatGPT had the highest quality score at 4.4, while Meta AI had the lowest with 3.4. ChatGPT and Google Gemini had Understandability scores of 100%, while Meta AI had the lowest with 82%. ChatGPT, Microsoft Copilot, and Google Gemini all had Actionability scores of 75%, while Meta AI had one of 50%. Conclusions ChatGPT produced the most reliable and highest quality materials, followed closely by Google Gemini. Meta AI produced the lowest quality materials. Given the easy accessibility that chatbots provide patients and the high-quality responses that we obtained, they could be a reliable source for patients to obtain information about cardiac catheterization.
背景 健康素养使患者能够参与自身的医疗保健。个人健康素养是指个人查找、理解和使用信息/资源以做出明智健康决策的能力。人工智能(AI)已成为通过大语言模型(LLM)驱动的聊天机器人获取健康相关信息的来源。迄今为止,对这些聊天机器人生成的健康信息的可读性和质量评估一直是众多研究的主题。本研究旨在评估人工智能聊天机器人生成的关于心脏导管插入术的患者教育材料的质量。
方法 我们向四个聊天机器人询问了一组关于心脏导管插入术的10个问题:ChatGPT(OpenAI,加利福尼亚州旧金山)、Microsoft Copilot(微软公司,华盛顿州雷德蒙德)、Google Gemini(谷歌DeepMind,英国伦敦)和Meta AI(Meta,纽约州纽约)。这些问题及随后的答案被用于制作关于心脏导管插入术的患者教育材料。使用两种经过验证的患者教育材料评估工具对这些材料的质量进行评估:DISCERN和患者教育材料评估工具(PEMAT)。
结果 ChatGPT的DISCERN总体得分是4.5,Microsoft Copilot和Google Gemini为4.4,Meta AI为3.8。ChatGPT、Microsoft Copilot和Google Gemini的可靠性得分并列最高,为4.6,而Meta AI最低,为4.2。ChatGPT的质量得分最高,为4.4,而Meta AI最低,为3.4。ChatGPT和Google Gemini的易懂性得分均为100%,而Meta AI最低,为82%。ChatGPT、Microsoft Copilot和Google Gemini的可操作性得分均为75%,而Meta AI为50%。
结论 ChatGPT生成的材料最可靠且质量最高,其次是Google Gemini。Meta AI生成的材料质量最低。鉴于聊天机器人为患者提供了便捷的获取途径以及我们获得的高质量回复,它们可能是患者获取心脏导管插入术相关信息的可靠来源。