• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

2023年以来电子健康记录中用于患者护理的生成式大语言模型:一项系统综述

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.

作者信息

Du Xinsong, Zhou Zhengyang, Wang Yifei, Chuang Ya-Wen, Yang Richard, Zhang Wenyu, Wang Xinyi, Zhang Rui, Hong Pengyu, Bates David W, Zhou Li

机构信息

Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts 02115.

Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115.

出版信息

medRxiv. 2024 Aug 19:2024.08.11.24311828. doi: 10.1101/2024.08.11.24311828.

DOI:10.1101/2024.08.11.24311828
PMID:39228726
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11370524/
Abstract

BACKGROUND

Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges.

OBJECTIVE

This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions.

METHODS

A Boolean search for peer-reviewed articles was conducted on May 19, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions.

RESULTS

The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses.

CONCLUSION

Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.

摘要

背景

生成式大语言模型(LLMs)代表了自然语言处理领域的一项重大进展,在各种任务中都取得了领先的性能。然而,它们在使用真实电子健康记录(EHRs)的临床环境中的应用仍然很少,并且存在众多挑战。

目的

本研究旨在系统回顾生成式大语言模型的使用情况,以及相关技术在涉及电子健康记录的患者护理相关主题中的有效性,总结面临的挑战,并提出未来的方向。

方法

2024年5月19日,使用PubMed和Web of Science对同行评审文章进行布尔搜索,以纳入自2023年(ChatGPT发布后一个月)以来发表的研究文章。对搜索结果进行去重。包括生物医学信息学家、计算机科学家和一名医生在内的多名评审人员对出版物进行资格筛选并进行数据提取。仅纳入利用生成式大语言模型分析真实电子健康记录数据的研究。我们总结了提示工程、微调、多模态电子健康记录数据和评估矩阵的使用情况。此外,我们确定了纳入研究报告的在临床环境中应用大语言模型当前面临的挑战,并提出了未来的方向。

结果

初步搜索识别出6328项独特研究,经过资格筛选后纳入76项研究。其中,67项研究(88.2%)采用了零样本提示,其中5项研究在五项特定临床任务上报告了100%的准确率。9项研究使用了先进的提示策略;4项对这些策略进行了实验测试,发现提示工程提高了性能,其中一项研究指出提示中的示例数量与性能提升之间存在非线性关系。8项研究探索了生成式大语言模型的微调,所有研究都报告了在特定任务上的性能提升,但其中3项研究指出在某些任务上微调后可能存在性能下降。只有2项研究使用了多模态数据,这改善了基于大语言模型的决策制定,并实现了准确的罕见病诊断和预后。这些研究为22个目的采用了55种不同的评估指标,如正确性完整性和简洁性。2项研究调查了大语言模型的偏差,一项未检测到偏差,另一项发现男性患者收到了更合适的临床决策建议。6项研究识别出幻觉,如在结构化甲状腺超声报告中编造患者姓名。其他挑战包括但不限于大语言模型咨询的客观语气让患者感到不舒服,以及患者理解大语言模型回复存在困难。

结论

我们的综述表明,很少有研究采用先进的计算技术来提高大语言模型的性能。所使用的多样化评估指标凸显了标准化的必要性。由于偏差、幻觉和客观回复等挑战,大语言模型目前无法取代医生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1e8/11370524/efcacad04938/nihpp-2024.08.11.24311828v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1e8/11370524/bd092ac5bf58/nihpp-2024.08.11.24311828v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1e8/11370524/efcacad04938/nihpp-2024.08.11.24311828v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1e8/11370524/bd092ac5bf58/nihpp-2024.08.11.24311828v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1e8/11370524/efcacad04938/nihpp-2024.08.11.24311828v2-f0002.jpg

相似文献

1
Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.2023年以来电子健康记录中用于患者护理的生成式大语言模型:一项系统综述
medRxiv. 2024 Aug 19:2024.08.11.24311828. doi: 10.1101/2024.08.11.24311828.
2
Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.将生成式大语言模型应用于电子健康记录的性能及改进策略:一项系统综述
Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Development and evaluation of large-language models (LLMs) for oncology: A scoping review.用于肿瘤学的大语言模型的开发与评估:一项范围综述。
PLOS Digit Health. 2025 Aug 7;4(8):e0000980. doi: 10.1371/journal.pdig.0000980. eCollection 2025 Aug.
7
Question Answering for Electronic Health Records: Scoping Review of Datasets and Models.电子健康记录问答:数据集和模型的范围综述。
J Med Internet Res. 2024 Oct 30;26:e53636. doi: 10.2196/53636.
8
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
9
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
10
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

引用本文的文献

1
Research progress and implications of the application of large language model in shared decision-making in China's healthcare field.大语言模型在中国医疗领域共享决策应用中的研究进展与启示
Front Public Health. 2025 Jul 10;13:1605212. doi: 10.3389/fpubh.2025.1605212. eCollection 2025.
2
Using Natural Language Processing and Machine Learning to classify the status of kidney allograft in Electronic Medical Records written in Spanish.使用自然语言处理和机器学习对西班牙语电子病历中同种异体肾移植的状态进行分类。
PLoS One. 2025 May 8;20(5):e0322587. doi: 10.1371/journal.pone.0322587. eCollection 2025.

本文引用的文献

1
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.大语言模型在医疗保健应用中的测试与评估:一项系统综述。
JAMA. 2025 Jan 28;333(4):319-328. doi: 10.1001/jama.2024.21700.
2
Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes.提高老年人认知能力下降的早期检测:一项在临床记录中使用大语言模型的比较研究。
EBioMedicine. 2024 Nov;109:105401. doi: 10.1016/j.ebiom.2024.105401. Epub 2024 Oct 12.
3
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.
基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
4
Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases.评估大型语言模型在复杂多模态医疗案例中的诊断性能。
J Med Internet Res. 2024 May 13;26:e53724. doi: 10.2196/53724.
5
ChatGPT4 Outperforms Endoscopists for Determination of Postcolonoscopy Rescreening and Surveillance Recommendations.ChatGPT4在确定结肠镜检查后的重新筛查和监测建议方面优于内镜医师。
Clin Gastroenterol Hepatol. 2024 Sep;22(9):1917-1925.e17. doi: 10.1016/j.cgh.2024.04.022. Epub 2024 May 9.
6
Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department.使用大型语言模型评估急诊科成人的临床敏锐度。
JAMA Netw Open. 2024 May 1;7(5):e248895. doi: 10.1001/jamanetworkopen.2024.8895.
7
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
8
Local large language models for privacy-preserving accelerated review of historic echocardiogram reports.用于保护隐私的局部大型语言模型,可加速回顾历史超声心动图报告。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2097-2102. doi: 10.1093/jamia/ocae085.
9
The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study.大语言模型在颌面创伤病例分诊中的作用:一项初步研究。
Diagnostics (Basel). 2024 Apr 18;14(8):839. doi: 10.3390/diagnostics14080839.
10
Can artificial intelligence make elective hand clinic letters easier for patients to understand?人工智能能否使手部择期就诊信更容易让患者理解?
J Hand Surg Eur Vol. 2024 Nov;49(10):1269-1270. doi: 10.1177/17531934241246479. Epub 2024 Apr 20.