• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于电子健康记录的大语言模型评估医院病程总结

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model.

作者信息

Small William R, Austrian Jonathan, O'Donnell Luke, Burk-Rafel Jesse, Hochman Katherine A, Goodman Adam, Zaretsky Jonah, Martin Jacob, Johnson Stephen, Major Vincent J, Jones Simon, Henke Christian, Verplanke Benjamin, Osso Jwan, Larson Ian, Saxena Archana, Mednick Aron, Simonis Choumika, Han Joseph, Kesari Ravi, Wu Xinyuan, Heery Lauren, Desel Tenzin, Baskharoun Samuel, Figman Noah, Farooq Umar, Shah Kunal, Jahan Nusrat, Kim Jeong Min, Testa Paul, Feldman Jonah

机构信息

Department of Health Informatics, New York University Langone Medical Center Information Technology.

Department of Medicine, New York University Grossman School of Medicine.

出版信息

JAMA Netw Open. 2025 Aug 1;8(8):e2526339. doi: 10.1001/jamanetworkopen.2025.26339.

DOI:10.1001/jamanetworkopen.2025.26339
PMID:40802185
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12351420/
Abstract

IMPORTANCE

Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown.

OBJECTIVES

To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC.

DESIGN, SETTING, AND PARTICIPANTS: Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health.

EXPOSURES

Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists.

MAIN OUTCOMES AND MEASURES

Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales).

RESULTS

Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46).

CONCLUSIONS AND RELEVANCE

Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.

摘要

重要性

医院病程(HC)总结对医生来说是出院小结中一项日益繁重的内容。文献支持使用大语言模型(LLM)进行HC总结,但医生能否有效地与嵌入电子健康记录的LLM合作来撰写HC尚不清楚。

目的

比较时间紧迫的住院医师为使基于LLM生成的HC和医生生成的HC朝着新颖的4C(完整、简洁、连贯且无虚构)HC改进所需的编辑工作量。

设计、设置和参与者:质量改进研究,采用便利样本,包括10名内科住院医师编辑、8名医院医生评估员,并于2023年12月在纽约大学朗格尼健康中心对随机选择的普通内科住院病例进行为期4至8天的研究。

暴露因素

住院医师和医院医生对随机分配的患者病历进行10分钟的审查。住院医师在不知作者类型的情况下,对每对HC(医生撰写的和基于LLM生成的)进行3分钟的质量编辑,随后由主治医院医生进行比较评分。

主要结局和测量指标

通过分析在控制长度(编辑百分比)和原始HC含义改变程度(语义变化)后HC对中发生的编辑来量化编辑工作量。医院医生通过对4C进行A/B测试(5点李克特量表转换为10点双向量表)比较编辑后的HC对。

结果

在100例住院病例中,与医生撰写的HC相比,住院医师编辑基于LLM生成的HC的百分比更小(基于LLM生成的HC均值[标准差]为31.5%[16.6%],医生撰写的为44.8%[20.0%];P < 0.001)。此外,基于LLM生成的HC所需的语义变化更少(基于LLM生成的HC均值[标准差]为2.4%[1.6%],医生撰写的为4.9%[3.5%];P < 0.001)。主治医生认为基于LLM生成的HC更完整(10点双向量表上基于LLM生成的HC与医生撰写的HC的均值[标准差]差异为3.00[5.28];P < 0.001),简洁程度相似(均值[标准差]为 -1.02[6.08];P = 0.20),连贯性也相似(均值[标准差]为0.70[6.14];P = 0.60),但虚构内容更多(均值[标准差]为 -0.98[3.53];P = 0.002)。综合得分相似(40点双向量表上基于LLM生成的HC与医生撰写的HC的均值[标准差]差异为1.70[14.24];P = 0.46)。

结论与意义

与医生生成的HC相比,嵌入电子健康记录的基于LLM生成的HC达到质量标准所需的编辑更少,生成的HC同样完整或更完整、简洁且连贯,但包含更多虚构内容。尽管存在人为时间限制的潜在影响,但本研究支持医生与LLM合作撰写HC的可行性,并为在临床实践中监测基于LLM生成的HC提供了依据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/47d160bda98c/jamanetwopen-e2526339-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/5aefe712c750/jamanetwopen-e2526339-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/47d160bda98c/jamanetwopen-e2526339-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/5aefe712c750/jamanetwopen-e2526339-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/47d160bda98c/jamanetwopen-e2526339-g002.jpg

相似文献

1
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model.基于电子健康记录的大语言模型评估医院病程总结
JAMA Netw Open. 2025 Aug 1;8(8):e2526339. doi: 10.1001/jamanetworkopen.2025.26339.
2
Physician- and Large Language Model-Generated Hospital Discharge Summaries.医生和大语言模型生成的医院出院小结
JAMA Intern Med. 2025 May 5. doi: 10.1001/jamainternmed.2025.0821.
3
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
4
Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.开发和评估大语言模型生成的急诊医学交接班记录
JAMA Netw Open. 2024 Dec 2;7(12):e2448723. doi: 10.1001/jamanetworkopen.2024.48723.
5
Identification of Long-Term Care Facility Residence From Admission Notes Using Large Language Models.使用大语言模型从入院记录中识别长期护理机构居民
JAMA Netw Open. 2025 May 1;8(5):e2512032. doi: 10.1001/jamanetworkopen.2025.12032.
6
Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响:一项随机临床试验。
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
7
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
8
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].五种大语言模型在口腔辅助诊断、治疗及健康咨询领域的应用初探
Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

本文引用的文献

1
The TRIPOD-LLM reporting guideline for studies using large language models.使用大语言模型的研究的TRIPOD-LLM报告指南。
Nat Med. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5. Epub 2025 Jan 8.
2
Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.开发和评估大语言模型生成的急诊医学交接班记录
JAMA Netw Open. 2024 Dec 2;7(12):e2448723. doi: 10.1001/jamanetworkopen.2024.48723.
3
Safety principles for medical summarization using generative AI.使用生成式人工智能进行医学摘要的安全原则。
Nat Med. 2024 Dec;30(12):3417-3419. doi: 10.1038/s41591-024-03313-y.
4
Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study.大语言模型与初级临床医生撰写的出院小结质量比较:单盲研究
J Med Internet Res. 2024 Jul 24;26:e57721. doi: 10.2196/57721.
5
The First Generative AI Prompt-A-Thon in Healthcare: A Novel Approach to Workforce Engagement with a Private Instance of ChatGPT.医疗保健领域的首届生成式人工智能提示马拉松:一种利用ChatGPT私有实例促进员工参与的新方法。
PLOS Digit Health. 2024 Jul 23;3(7):e0000394. doi: 10.1371/journal.pdig.0000394. eCollection 2024 Jul.
6
Large Language Model-Based Responses to Patients' In-Basket Messages.基于大语言模型的患者收件箱消息回复。
JAMA Netw Open. 2024 Jul 1;7(7):e2422399. doi: 10.1001/jamanetworkopen.2024.22399.
7
Harnessing the Power of Generative AI for Clinical Summaries: Perspectives From Emergency Physicians.利用生成式人工智能为临床总结提供助力:来自急诊医师的观点。
Ann Emerg Med. 2024 Aug;84(2):128-138. doi: 10.1016/j.annemergmed.2024.01.039. Epub 2024 Mar 12.
8
The Limits of Clinician Vigilance as an AI Safety Bulwark.临床医生警觉性作为人工智能安全保障的局限性。
JAMA. 2024 Apr 9;331(14):1173-1174. doi: 10.1001/jama.2024.3620.
9
Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format.生成式人工智能将住院病历摘要转换为患者友好型语言和格式。
JAMA Netw Open. 2024 Mar 4;7(3):e240357. doi: 10.1001/jamanetworkopen.2024.0357.
10
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.