Suppr超能文献

将生成式大语言模型应用于电子健康记录的性能及改进策略:一项系统综述

Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.

作者信息

Du Xinsong, Zhou Zhengyang, Wang Yifei, Chuang Ya-Wen, Li Yiming, Yang Richard, Hong Pengyu, Bates David W, Zhou Li

机构信息

Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, United States; Department of Medicine, Harvard Medical School, Boston, MA 02115, United States.

Department of Computer Science, Brandeis University, Waltham, MA 02453, United States.

出版信息

Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.

Abstract

PURPOSE

To synthesize performance and improvement strategies for adapting generative LLMs in EHR analyses and applications.

METHODS

We followed the PRISMA guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023 and November 9, 2024. Multiple reviewers including biomedical informaticians and a clinician involved in the article reviewing process. Studies were included if they used generative LLMs to analyze real-world EHR data and reported quantitative performance evaluations for an improvement technique. The review identified key clinical applications, summarized performance and the improvement strategies.

RESULTS

Of the 18,735 articles retrieved, 196 met our criteria. 112 (57.1%) studies used generative LLMs for clinical decision support tasks, 40 (20.4%) studies involved documentation tasks, 39 (19.9%) studies involved information extraction tasks, 11 (5.6%) studies involved patient communication tasks, and 10 (5.1%) studies included summarization tasks. Among the 196 studies, most studies (88.8%) did not quantitatively evaluate the LLM performance improvement strategies, with the rest twenty-four studies (12.2%) quantitatively evaluated the effectiveness of in-context learning (9 studies), fine-tuning (12 studies), multimodal integration (8 studies), and ensemble learning (2 studies). Three studies highlighted that few-shot prompting, fine-tuning, and multimodal data integration might not improve performance, and another two studies found that fine-tuning a smaller model could outperform a large model.

CONCLUSION

Applying a performance improvement strategy may not necessarily lead to performance improvement, and detailed guidelines regarding how to apply those strategies more effectively and safely are needed, which can be completed from more quantitative analysis in the future.

摘要

目的

综合生成式大语言模型(LLMs)在电子健康记录(EHR)分析和应用中的性能及改进策略。

方法

我们遵循PRISMA指南,对2023年1月1日至2024年11月9日期间在PubMed和Web of Science上发表的文章进行系统综述。多名评审人员参与文章评审过程,包括生物医学信息学家和一名临床医生。若研究使用生成式LLMs分析真实世界的EHR数据并报告改进技术的定量性能评估,则纳入该研究。本综述确定了关键临床应用,总结了性能及改进策略。

结果

在检索到的18735篇文章中,196篇符合我们的标准。112项(57.1%)研究将生成式LLMs用于临床决策支持任务,40项(20.4%)研究涉及文档任务,39项(19.9%)研究涉及信息提取任务,11项(5.6%)研究涉及患者沟通任务,10项(5.1%)研究包括摘要任务。在这196项研究中,大多数研究(88.8%)未对LLM性能改进策略进行定量评估,其余24项研究(12.2%)对上下文学习(9项研究)、微调(12项研究)、多模态整合(8项研究)和集成学习(2项研究)的有效性进行了定量评估。三项研究强调少样本提示、微调及多模态数据整合可能不会提高性能,另外两项研究发现对较小模型进行微调可能优于大模型。

结论

应用性能改进策略不一定会带来性能提升,需要关于如何更有效、安全地应用这些策略的详细指南,这可在未来通过更多定量分析来完善。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验