Suppr超能文献

开发和评估大语言模型生成的急诊医学交接班记录

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.

作者信息

Hartman Vince, Zhang Xinyuan, Poddar Ritika, McCarty Matthew, Fortenko Alexander, Sholle Evan, Sharma Rahul, Campion Thomas, Steel Peter A D

机构信息

Abstractive Health, New York, New York.

Department of Emergency Medicine, NewYork-Presbyterian/Weill Cornell Medicine, New York.

出版信息

JAMA Netw Open. 2024 Dec 2;7(12):e2448723. doi: 10.1001/jamanetworkopen.2024.48723.

Abstract

IMPORTANCE

An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs.

OBJECTIVE

To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes.

DESIGN, SETTING, AND PARTICIPANTS: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. A customized clinical LLM pipeline was trained, tested, and evaluated to generate templated EM-to-IP handoff notes. Using both conventional automated methods (ie, recall-oriented understudy for gisting evaluation [ROUGE], bidirectional encoder representations from transformers score [BERTScore], and source chunking approach for large-scale inconsistency evaluation [SCALE]) and a novel patient safety-focused framework, LLM-generated handoff notes vs physician-written notes were compared. Data were analyzed from October 2023 to March 2024.

EXPOSURE

LLM-generated EM handoff notes.

MAIN OUTCOMES AND MEASURES

LLM-generated handoff notes were evaluated for (1) lexical similarity with respect to physician-written notes using ROUGE and BERTScore; (2) fidelity with respect to source notes using SCALE; and (3) readability, completeness, curation, correctness, usefulness, and implications for patient safety using a novel framework.

RESULTS

In this study of 1600 EM patient records (832 [52%] female and mean [SD] age of 59.9 [18.9] years), LLM-generated handoff notes, compared with physician-written ones, had higher ROUGE (0.322 vs 0.088), BERTScore (0.859 vs 0.796), and SCALE scores (0.691 vs 0.456), indicating the LLM-generated summaries exhibited greater similarity and more detail. As reviewed by 3 board-certified EM physicians, a subsample of 50 LLM-generated summaries had a mean (SD) usefulness score of 4.04 (0.86) out of 5 (compared with 4.36 [0.71] for physician-written) and mean (SD) patient safety scores of 4.06 (0.86) out of 5 (compared with 4.50 [0.56] for physician-written). None of the LLM-generated summaries were classified as a critical patient safety risk.

CONCLUSIONS AND RELEVANCE

In this cohort study of 1600 EM patient medical records, LLM-generated EM-to-IP handoff notes were determined superior compared with physician-written summaries via conventional automated evaluation methods, but marginally inferior in usefulness and safety via a novel evaluation framework. This study suggests the importance of a physician-in-loop implementation design for this model and demonstrates an effective strategy to measure preimplementation patient safety of LLM models.

摘要

重要性

由大语言模型(LLM)生成的急诊医学(EM)交接班记录有潜力减轻医生的文档记录负担,同时不影响急诊到住院患者(IP)交接班的安全性。

目的

开发由LLM生成的EM到IP的交接班记录,并与医生手写记录相比,评估其准确性和安全性。

设计、设置和参与者:这项队列研究使用了2023年在纽约长老会/威尔康奈尔医学中心发生急性住院的EM患者病历。训练、测试和评估了一个定制的临床LLM管道,以生成模板化的EM到IP的交接班记录。使用传统的自动化方法(即用于摘要评估的召回导向替代方法[ROUGE]、来自变换器分数的双向编码器表示[BERTScore]以及用于大规模不一致评估的源分块方法[SCALE])和一个新的以患者安全为重点的框架,比较了LLM生成的交接班记录与医生手写记录。对2023年10月至2024年3月的数据进行了分析。

暴露因素

LLM生成的EM交接班记录。

主要结局和测量指标

对LLM生成的交接班记录进行了以下评估:(1)使用ROUGE和BERTScore评估与医生手写记录的词汇相似性;(2)使用SCALE评估与源记录的保真度;(3)使用一个新框架评估可读性、完整性、整理、正确性、有用性以及对患者安全的影响。

结果

在这项对1600份EM患者记录(832例[52%]为女性,平均[标准差]年龄为59.9[18.9]岁)的研究中,与医生手写记录相比,LLM生成的交接班记录具有更高的ROUGE(0.322对0.088)、BERTScore(0.859对0.796)和SCALE分数(0.691对0.456),表明LLM生成的摘要表现出更大的相似性和更多细节。由3名获得委员会认证的EM医生进行审查,50份LLM生成的摘要子样本的平均(标准差)有用性评分为4.04(0.86)(满分5分,医生手写记录为4.36[0.71]),平均(标准差)患者安全评分为4.06(0.86)(满分5分,医生手写记录为4.50[0.56])。没有一份LLM生成的摘要被归类为关键的患者安全风险。

结论和相关性

在这项对1600份EM患者病历的队列研究中,通过传统自动化评估方法确定,LLM生成的EM到IP的交接班记录优于医生手写摘要,但通过一个新的评估框架,在有用性和安全性方面略逊一筹。这项研究表明了该模型的医生参与式实施设计的重要性,并展示了一种测量LLM模型实施前患者安全的有效策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fc6/11615705/d29a76eefd33/jamanetwopen-e2448723-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验