Ebnali Harari Rayan, Altaweel Abdullah, Ahram Tareq, Keehner Madeleine, Shokoohi Hamid
STRATUS, Mass General Brigham, Harvard Medical School, MA, USA.
STRATUS, Mass General Brigham, Harvard Medical School, MA, USA; Ministry of Health, Kuwait.
Int J Med Inform. 2025 Mar;195:105701. doi: 10.1016/j.ijmedinf.2024.105701. Epub 2024 Nov 29.
The integration of generative artificial intelligence (AI) as clinical decision support systems (CDSS) into telemedicine presents a significant opportunity to enhance clinical outcomes, yet its application remains underexplored.
This study investigates the efficacy of one of the most common generative AI tools, ChatGPT, for providing clinical guidance during cardiac arrest scenarios.
We examined the performance, cognitive load, and trust associated with traditional methods (paper guide), autonomous ChatGPT, and clinician-supervised ChatGPT, where a clinician supervised the AI recommendations. Fifty-four subjects without medical backgrounds participated in randomized controlled trials, each assigned to one of three intervention groups: paper guide, ChatGPT, or supervised ChatGPT. Participants completed a standardized CPR scenario using an Augmented Reality (AR) headset, and performance, physiological, and self-reported metrics were recorded.
Results indicate that the Supervised-ChatGPT group showed significantly higher decision accuracy compared to the paper guide and ChatGPT groups, although the scenario completion time was longer. Physiological data showed a reduced LF/HF ratio in the Supervised-ChatGPT group, suggesting potentially lower cognitive load. Trust in AI was also highest in the supervised condition. In one instance, ChatGPT suggested a risky option, highlighting the need for clinician supervision.
Our findings highlight the potential of supervised generative AI to enhance decision-making accuracy and user trust in emergency healthcare settings, despite trade-offs with response time. The study underscores the importance of clinician oversight and the need for further refinement of AI systems to improve safety. Future research should explore strategies to optimize AI supervision and assess the implementation of these systems in real-world clinical settings.
将生成式人工智能(AI)作为临床决策支持系统(CDSS)集成到远程医疗中,为改善临床结果提供了重大机遇,但其应用仍未得到充分探索。
本研究调查了最常见的生成式AI工具之一ChatGPT在心脏骤停场景中提供临床指导的效果。
我们研究了与传统方法(纸质指南)、自主ChatGPT以及临床医生监督的ChatGPT相关的性能、认知负荷和信任度,其中临床医生会监督AI的建议。54名没有医学背景的受试者参与了随机对照试验,每人被分配到三个干预组之一:纸质指南组、ChatGPT组或监督ChatGPT组。参与者使用增强现实(AR)头戴设备完成了标准化的心肺复苏场景,并记录了性能、生理和自我报告指标。
结果表明,与纸质指南组和ChatGPT组相比,监督ChatGPT组的决策准确性显著更高,尽管场景完成时间更长。生理数据显示,监督ChatGPT组的低频/高频比值降低,表明认知负荷可能更低。在有监督的情况下,对AI的信任度也最高。在一个案例中,ChatGPT提出了一个有风险的选项,凸显了临床医生监督的必要性。
我们的研究结果凸显了有监督的生成式AI在提高紧急医疗环境中的决策准确性和用户信任度方面的潜力,尽管在响应时间上存在权衡。该研究强调了临床医生监督的重要性以及进一步完善AI系统以提高安全性的必要性。未来的研究应探索优化AI监督的策略,并评估这些系统在实际临床环境中的实施情况。