使用大语言模型对电子健康记录进行可验证的摘要以支持病历审查。

Verifiable Summarization of Electronic Health Records Using Large Language Models to Support Chart Review.

作者信息

Verma Ritchie, Alsentzer Emily, Strasser Zachary, Chang Leslie, Roman Kirollos, Gershanik Esteban, Hernandez Camellia, Linares Miguel, Rodriguez Jorge, Thakral Durga, Unlu Ozan, You Jacqueline, Zhou Li, Bates David

机构信息

Massachusetts General Hospital, Boston, MA, USA, 02114.

Present Address: Oregon Health and Science University, Portland, OR, USA, 97239.

出版信息

medRxiv. 2025 Jun 3:2025.06.02.25328807. doi: 10.1101/2025.06.02.25328807.

DOI:10.1101/2025.06.02.25328807

PMID:40502573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12155021/

Abstract

Information overload in electronic health records (EHRs) hampers clinicians' ability to efficiently extract and synthesize critical information from a patient's longitudinal health record, leading to increased cognitive burden and delays in care. This study explores the potential of large language models (LLMs) to address this challenge by generating problem-based admission summaries for patients admitted with heart failure, a leading cause of hospitalization worldwide. We developed an extract-then-abstract approach guided by disease-specific "summary bundles" to generate summaries of longitudinal clinical notes that prioritize clinically relevant information. Through a mixed-methods evaluation using real-world clinical notes, we compared physicians' ability to answer patient-specific clinical questions with the LLM-generated summaries versus standard chart review. While summary access did not significantly reduce overall questionnaire completion time, frequent summary use significantly contributed to faster questionnaire completion (p = 0.002). Individual physicians varied in how effectively they leveraged the summaries. Importantly, summary use maintained accuracy in answering clinical questions (88.0% with summaries vs. 86.4% without). All physicians indicated they were "likely" or "very likely" to use the summaries in clinical practice, and 87.5% reported that the summaries would save them time. Preferences for summary format varied, highlighting the need for customizable summaries aligned with individual clinician workflows. This study provides one of the first extrinsic evaluations of LLMs for longitudinal summarization, demonstrating their potential to enhance clinician efficiency, alleviate workload, and support informed decision-making in time-sensitive care environments.

摘要

电子健康记录（EHRs）中的信息过载阻碍了临床医生从患者的纵向健康记录中有效提取和综合关键信息的能力，导致认知负担增加和护理延误。本研究探讨了大语言模型（LLMs）通过为因心力衰竭入院的患者生成基于问题的入院总结来应对这一挑战的潜力，心力衰竭是全球住院的主要原因。我们开发了一种由疾病特定的“总结包”指导的先提取后摘要的方法，以生成纵向临床记录的总结，突出临床相关信息。通过使用真实世界临床记录的混合方法评估，我们将医生使用大语言模型生成的总结与标准病历审查来回答患者特定临床问题的能力进行了比较。虽然获取总结并没有显著减少总体问卷完成时间，但频繁使用总结显著有助于更快地完成问卷（p = 0.002）。不同医生利用总结的效果各不相同。重要的是，使用总结在回答临床问题时保持了准确性（使用总结时为88.0%，不使用时为86.4%）。所有医生都表示他们“可能”或“非常可能”在临床实践中使用这些总结，87.5%的医生报告说这些总结会节省他们的时间。对总结格式的偏好各不相同，这突出了需要与个体临床医生工作流程相匹配的可定制总结。本研究提供了对大语言模型用于纵向总结的首批外部评估之一，证明了它们在提高临床医生效率、减轻工作量以及在时间敏感的护理环境中支持明智决策方面的潜力。