Levin Chedva, Suliman Moriya, Naimi Etti, Saban Mor
Nursing Department, Faculty of School of Life and Health Sciences, The Jerusalem College of Technology-lev Academic Center, Jerusalem, Israel.
Department of Vascular Surgery, The Chaim Sheba Medical Center, Ramat Gan, Tel Aviv, Israel.
J Clin Nurs. 2024 Aug 5. doi: 10.1111/jocn.17384.
As generative artificial intelligence (GenAI) tools continue advancing, rigorous evaluations are needed to understand their capabilities relative to experienced clinicians and nurses. The aim of this study was to objectively compare the diagnostic accuracy and response formats of ICU nurses versus various GenAI models, with a qualitative interpretation of the quantitative results.
This formative study utilized four written clinical scenarios representative of real ICU patient cases to simulate diagnostic challenges. The scenarios were developed by expert nurses and underwent validation against current literature. Seventy-four ICU nurses participated in a simulation-based assessment involving four written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified ICU nurses for accuracy, completeness and response.
Nurses consistently achieved higher diagnostic accuracy than AI across open-ended scenarios, though certain models matched or exceeded human performance on standardized cases. Reaction times also diverged substantially. Qualitative response format differences emerged such as concision versus verbosity. Variations in GenAI models system performance across cases highlighted generalizability challenges.
While GenAI demonstrated valuable skills, experienced nurses outperformed in open-ended domains requiring holistic judgement. Continued development to strengthen generalized decision-making abilities is warranted before autonomous clinical integration. Response format interfaces should consider leveraging distinct strengths. Rigorous mixed methods research involving diverse stakeholders can help iteratively inform safe, beneficial human-GenAI partnerships centred on experience-guided care augmentation.
This mixed-methods simulation study provides formative insights into optimizing collaborative models of GenAI and nursing knowledge to support patient assessment and decision-making in intensive care. The findings can help guide development of explainable GenAI decision support tailored for critical care environments.
Patients or public were not involved in the design and implementation of the study or the analysis and interpretation of the data.
随着生成式人工智能(GenAI)工具不断发展,需要进行严格评估以了解其相对于经验丰富的临床医生和护士的能力。本研究的目的是客观比较重症监护病房(ICU)护士与各种GenAI模型的诊断准确性和回答格式,并对定量结果进行定性解读。
这项形成性研究利用了四个代表真实ICU患者病例的书面临床场景来模拟诊断挑战。这些场景由专家护士开发,并根据当前文献进行了验证。74名ICU护士参与了一项基于模拟的评估,该评估涉及四个书面临床场景。同时,我们要求ChatGPT-4和Claude-2.0对相同场景提供初步评估和治疗建议。然后,由认证的ICU护士对ChatGPT-4和Claude-2.0的回答进行准确性、完整性和回答方面的评分。
在开放式场景中,护士的诊断准确性始终高于人工智能,不过某些模型在标准化病例上达到或超过了人类表现。反应时间也存在很大差异。出现了定性的回答格式差异,例如简洁与冗长。GenAI模型系统在不同病例中的性能差异突出了普遍性挑战。
虽然GenAI展示了有价值的技能,但在需要整体判断的开放式领域中,经验丰富的护士表现更优。在将其自主整合到临床之前,有必要继续加强其通用决策能力的开发。回答格式界面应考虑利用各自的独特优势。涉及不同利益相关者的严格混合方法研究有助于迭代地为以经验指导的护理增强为中心的安全、有益的人机GenAI合作提供信息。
这项混合方法模拟研究为优化GenAI与护理知识的协作模型以支持重症监护中的患者评估和决策提供了形成性见解。研究结果有助于指导为重症监护环境量身定制的可解释GenAI决策支持的开发。
患者或公众未参与研究设计与实施,也未参与数据的分析与解读。