Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA.
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.
分析大量的文本数据并从电子健康记录中总结关键信息,这给临床医生如何分配时间带来了很大的负担。尽管大型语言模型 (LLM) 在自然语言处理 (NLP) 任务中表现出了很大的潜力,但它们在各种临床总结任务中的有效性尚未得到证实。在这里,我们应用了适应方法来评估八个 LLM,涵盖了四个不同的临床总结任务:放射学报告、患者问题、进度记录和医患对话。使用句法、语义和概念 NLP 指标进行的定量评估揭示了模型和适应方法之间的权衡。一项有 10 名医生参与的临床读者研究评估了摘要的完整性、正确性和简洁性;在大多数情况下,我们最好适应的 LLM 生成的摘要被认为与医学专家生成的摘要一样(45%)或更好(36%)。随后的安全性分析突出了 LLM 和医学专家都面临的挑战,因为我们将错误与潜在的医疗伤害联系起来,并对编造信息的类型进行分类。我们的研究提供了证据,证明 LLM 在多个任务中的临床文本总结表现优于医学专家。这表明将 LLM 集成到临床工作流程中可以减轻文档编制的负担,使临床医生能够更多地关注患者护理。