Du Xinsong, Zhou Zhengyang, Wang Yifei, Chuang Ya-Wen, Li Yiming, Yang Richard, Hong Pengyu, Bates David W, Zhou Li
Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, United States; Department of Medicine, Harvard Medical School, Boston, MA 02115, United States.
Department of Computer Science, Brandeis University, Waltham, MA 02453, United States.
Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.
PURPOSE: To synthesize performance and improvement strategies for adapting generative LLMs in EHR analyses and applications. METHODS: We followed the PRISMA guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023 and November 9, 2024. Multiple reviewers including biomedical informaticians and a clinician involved in the article reviewing process. Studies were included if they used generative LLMs to analyze real-world EHR data and reported quantitative performance evaluations for an improvement technique. The review identified key clinical applications, summarized performance and the improvement strategies. RESULTS: Of the 18,735 articles retrieved, 196 met our criteria. 112 (57.1%) studies used generative LLMs for clinical decision support tasks, 40 (20.4%) studies involved documentation tasks, 39 (19.9%) studies involved information extraction tasks, 11 (5.6%) studies involved patient communication tasks, and 10 (5.1%) studies included summarization tasks. Among the 196 studies, most studies (88.8%) did not quantitatively evaluate the LLM performance improvement strategies, with the rest twenty-four studies (12.2%) quantitatively evaluated the effectiveness of in-context learning (9 studies), fine-tuning (12 studies), multimodal integration (8 studies), and ensemble learning (2 studies). Three studies highlighted that few-shot prompting, fine-tuning, and multimodal data integration might not improve performance, and another two studies found that fine-tuning a smaller model could outperform a large model. CONCLUSION: Applying a performance improvement strategy may not necessarily lead to performance improvement, and detailed guidelines regarding how to apply those strategies more effectively and safely are needed, which can be completed from more quantitative analysis in the future.
J Med Internet Res. 2025-7-11
J Med Internet Res. 2024-12-11
PLOS Digit Health. 2025-8-7
J Am Med Inform Assoc. 2025-3-1
Patient Prefer Adherence. 2025-7-31
J Med Internet Res. 2024-11-14
BMC Med Inform Decis Mak. 2024-10-3