儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?

作者信息

Kular Seerat, Kumar Vikas

机构信息

Medicine, All India Institute of Medical Sciences, Bathinda, Bathinda, IND.

Pharmacology, All India Institute of Medical Sciences, Bathinda, Bathinda, IND.

出版信息

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

DOI:10.7759/cureus.89234

PMID:40901221

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12401188/

Abstract

Introduction Artificial intelligence (AI) chatbots have rapidly gained popularity for disseminating health information, especially with the growth of digital medicine in recent times. Recent studies have shown that Chat Generative Pre-Trained Transformer (ChatGPT; OpenAI, San Francisco, CA), a widely used AI chatbot, has at times surpassed emergency department physicians in diagnostic accuracy and has passed basic life support (BLS) exams, underscoring its potential for emergency use. Parents are a key demographic for online health information, frequently turning to these chatbots for urgent guidance during child-related emergencies, such as choking incidents. While research has extensively examined AI chatbots' effectiveness in delivering adult BLS guidelines, their accuracy and reliability in providing pediatric BLS guidance aligned with American Heart Association (AHA) standards remain underexplored. This gap raises concerns about the safety and appropriateness of relying on AI chatbots for guidance in pediatric emergencies. In light of this, we hoped that comparing the performance of two ChatGPT versions, ChatGPT-4o and ChatGPT-4o mini, against established pediatric protocols by AHA could help optimize their integration into emergency response frameworks, providing parents with reliable assistance in critical situations. This analysis can pinpoint improvements for real-world integration, ensuring trustworthy assistance in critical situations. Methodology A prospective comparative content analysis was conducted between responses from ChatGPT (version 4o and its mini version) against the 2020 AHA Guidelines for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care. The analysis focused on pediatric BLS, utilizing 13 broad questions designed to cover all key components, including fundamental concepts like the pediatric chain of survival and specific emergencies such as choking. Responses were evaluated for completeness and conformity to AHA guidelines. Completeness of the responses was analyzed as 'Completely Addressed', 'Partially Addressed', or 'Not Addressed', with partial responses further classified as 'Superficial', 'Inaccurate', or 'Hallucination'. Conformity of responses to AHA 2020 guidelines was similarly analyzed and classified. Assessment of reliability was performed using Cronbach's alpha. Cohen's kappa was used to check for interrater agreement between responses generated from two separate devices for the same set of questions. Results Content analysis of ChatGPT responses revealed that only 9.61% were fully addressed, and just 5.77% fully conformed to the AHA 2020 pediatric BLS guidelines. A majority of the responses (61.54%) were partially addressed and lacked depth, while 59.61% conformed only partially and superficially to the guidelines. Additionally, 5.77% of the queries were not addressed at all. ChatGPT-4o responses were generally more detailed and comprehensive compared to those from ChatGPT-4o mini. Inter-rater agreement ranged from slight to substantial between the two users. Conclusions While chatbots may assist with basic guidance, they lack the accuracy, depth, and hands-on instruction crucial for life-saving procedures. Misinterpretation or incomplete information from chatbots could lead to critical errors in emergencies. Hence, widespread BLS training remains essential for ensuring individuals have the practical skills and precise knowledge needed to respond effectively in real-life situations.

摘要

引言

人工智能（AI）聊天机器人在传播健康信息方面迅速受到欢迎，尤其是随着近年来数字医学的发展。最近的研究表明，广泛使用的AI聊天机器人Chat生成预训练变换器（ChatGPT；OpenAI，旧金山，加利福尼亚州）有时在诊断准确性上超过了急诊科医生，并且通过了基础生命支持（BLS）考试，凸显了其在紧急情况下使用的潜力。家长是在线健康信息的关键受众，在与儿童相关的紧急情况（如窒息事件）中，他们经常会向这些聊天机器人寻求紧急指导。虽然研究广泛考察了AI聊天机器人在提供成人BLS指南方面的有效性，但它们在提供符合美国心脏协会（AHA）标准的儿科BLS指导方面的准确性和可靠性仍未得到充分探索。这一差距引发了人们对在儿科紧急情况下依赖AI聊天机器人提供指导的安全性和适当性的担忧。有鉴于此，我们希望通过比较两个ChatGPT版本ChatGPT - 4o和ChatGPT - 4o mini与AHA既定的儿科规程的表现，有助于优化它们在应急响应框架中的整合，在危急情况下为家长提供可靠的帮助。该分析可以确定在实际应用中的改进方向，确保在危急情况下提供可靠的帮助。

方法

对ChatGPT（版本4o及其迷你版本）针对2020年AHA心肺复苏和心血管急救指南的回复进行前瞻性比较内容分析。分析聚焦于儿科BLS，利用13个广泛的问题来涵盖所有关键要素，包括儿科生存链等基本概念以及窒息之类的特定紧急情况。对回复进行完整性和符合AHA指南情况的评估。回复的完整性被分析为“完全涵盖”“部分涵盖”或“未涵盖”，部分回复进一步分类为“表面的”“不准确的”或“幻觉性的”。回复与AHA 2020指南的符合情况也进行了类似的分析和分类。使用Cronbach's alpha进行可靠性评估。使用Cohen's kappa来检查针对同一组问题从两个不同设备生成的回复之间的评分者间一致性。

结果

ChatGPT回复的内容分析显示，只有9.61%的回复得到了全面解答，仅有5.77%完全符合AHA 2020年儿科BLS指南。大多数回复（61.54%）部分涵盖且缺乏深度，而59.61%仅部分且表面上符合指南。此外，5.77%的问题根本未得到解答。与ChatGPT - 4o mini的回复相比，ChatGPT - 4o的回复通常更详细、更全面。两位用户之间的评分者间一致性从轻微到显著不等。

结论

虽然聊天机器人可能有助于提供基本指导，但它们缺乏对救生程序至关重要的准确性、深度和实际操作指导。聊天机器人的错误解读或不完整信息可能导致紧急情况下的严重错误。因此，广泛的BLS培训对于确保个人具备在现实生活中有效应对所需的实践技能和精确知识仍然至关重要。