EHRAgent：代码助力大语言模型在电子健康记录上进行少样本复杂表格推理。

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records.

作者信息

Shi Wenqi, Xu Ran, Zhuang Yuchen, Yu Yue, Zhang Jieyu, Wu Hang, Zhu Yuanda, Ho Joyce, Yang Carl, Wang May D

机构信息

Georgia Institute of Technology.

Emory University.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:22315-22339. doi: 10.18653/v1/2024.emnlp-main.1245.

DOI:10.18653/v1/2024.emnlp-main.1245

PMID:40018366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11867733/

Abstract

Clinicians often rely on data engineers to retrieve complex patient information from electronic health record (EHR) systems, a process that is both inefficient and time-consuming. We propose EHRAgent, a large language model (LLM) agent empowered with accumulative domain knowledge and robust coding capability. EHRAgent enables autonomous code generation and execution to facilitate clinicians in directly interacting with EHRs using natural language. Specifically, we formulate a multi-tabular reasoning task based on EHRs as a tool-use planning process, efficiently decomposing a complex task into a sequence of manageable actions with external toolsets. We first inject relevant medical information to enable EHRAgent to effectively reason about the given query, identifying and extracting the required records from the appropriate tables. By integrating interactive coding and execution feedback, EHRAgent then effectively learns from error messages and iteratively improves its originally generated code. Experiments on three real-world EHR datasets show that EHRAgent outperforms the strongest baseline by up to 29.6% in success rate, verifying its strong capacity to tackle complex clinical tasks with minimal demonstrations.

摘要

临床医生常常依赖数据工程师从电子健康记录（EHR）系统中检索复杂的患者信息，这一过程既低效又耗时。我们提出了EHRAgent，这是一个具备累积领域知识和强大编码能力的大语言模型（LLM）智能体。EHRAgent能够自主生成并执行代码，以便临床医生使用自然语言直接与电子健康记录进行交互。具体而言，我们将基于电子健康记录的多表推理任务制定为一个工具使用规划过程，有效地将复杂任务分解为一系列可通过外部工具集管理的操作。我们首先注入相关医学信息，使EHRAgent能够有效地对给定查询进行推理，从适当的表格中识别并提取所需记录。通过整合交互式编码和执行反馈，EHRAgent随后从错误消息中有效学习，并迭代改进其最初生成的代码。在三个真实世界的电子健康记录数据集上进行的实验表明，EHRAgent在成功率方面比最强基线高出29.6%，验证了其以最少演示解决复杂临床任务的强大能力。

相似文献

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records.EHRAgent：代码助力大语言模型在电子健康记录上进行少样本复杂表格推理。

Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:22315-22339. doi: 10.18653/v1/2024.emnlp-main.1245.

Distilling the knowledge from large-language model for health event prediction.从大语言模型中提取知识用于健康事件预测。

Sci Rep. 2024 Dec 28;14(1):30675. doi: 10.1038/s41598-024-75331-2.

Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.将医学知识图谱融入大语言模型进行诊断预测：设计与应用研究

JMIR AI. 2025 Feb 24;4:e58670. doi: 10.2196/58670.

EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records.EHR-BERT：一种基于 BERT 的电子健康记录中有效异常检测模型。

J Biomed Inform. 2024 Feb;150:104605. doi: 10.1016/j.jbi.2024.104605. Epub 2024 Feb 6.

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.通过生成增强检索和分层思维链赋能大型语言模型进行自动化临床评估。

Artif Intell Med. 2025 Apr;162:103078. doi: 10.1016/j.artmed.2025.103078. Epub 2025 Feb 12.

Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性：来自电子健康记录的心血管诊断案例研究

JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.

Question Answering for Electronic Health Records: Scoping Review of Datasets and Models.电子健康记录问答：数据集和模型的范围综述。

J Med Internet Res. 2024 Oct 30;26:e53636. doi: 10.2196/53636.

MISTIC: a novel approach for metastasis classification in Italian electronic health records using transformers.MISTIC：一种使用变压器对意大利电子健康记录中的转移进行分类的新方法。

BMC Med Inform Decis Mak. 2025 Apr 10;25(1):160. doi: 10.1186/s12911-025-02994-w.

Automating Evaluation of AI Text Generation in Healthcare with a Large Language Model (LLM)-as-a-Judge.使用大语言模型（LLM）作为评判器对医疗保健领域的人工智能文本生成进行自动化评估。

medRxiv. 2025 May 6:2025.04.22.25326219. doi: 10.1101/2025.04.22.25326219.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

AI Agents in Clinical Medicine: A Systematic Review.临床医学中的人工智能代理：一项系统综述。

medRxiv. 2025 Aug 26:2025.08.22.25334232. doi: 10.1101/2025.08.22.25334232.

Unstructured Electronic Health Records of Dysphagic Patients Analyzed by Large Language Models.由大语言模型分析的吞咽困难患者的非结构化电子健康记录

IEEE J Transl Eng Health Med. 2025 May 19;13:237-245. doi: 10.1109/JTEHM.2025.3571255. eCollection 2025.

Train-Time and Test-Time Computation in Large Language Models for Error Detection and Correction in Electronic Medical Records: A Retrospective Study.用于电子病历错误检测与纠正的大语言模型中的训练时和测试时计算：一项回顾性研究

Diagnostics (Basel). 2025 Jul 21;15(14):1829. doi: 10.3390/diagnostics15141829.

Performance of single-agent and multi-agent language models in Spanish language medical competency exams.单智能体和多智能体语言模型在西班牙语医学能力考试中的表现。

BMC Med Educ. 2025 May 7;25(1):666. doi: 10.1186/s12909-025-07250-3.

: Towards Autonomous Electronic Health Record Navigation.迈向自主电子健康记录导航

Res Sq. 2025 Mar 18:rs.3.rs-6102516. doi: 10.21203/rs.3.rs-6102516/v1.

From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR.从基本特征到额外特征：用于电子健康记录平衡临床预测的超图变换器预训练后微调

Proc Mach Learn Res. 2024 Jun;248:182-197.

Demystifying Large Language Models for Medicine: A Primer.揭开医学领域大语言模型的神秘面纱：入门指南。

ArXiv. 2024 Nov 20:arXiv:2410.18856v3.

本文引用的文献

Empowering biomedical discovery with AI agents.利用人工智能代理增强生物医学发现。

Cell. 2024 Oct 31;187(22):6125-6151. doi: 10.1016/j.cell.2024.09.022.

Augmenting large language models with chemistry tools.用化学工具增强大语言模型。

Nat Mach Intell. 2024;6(5):525-535. doi: 10.1038/s42256-024-00832-8. Epub 2024 May 8.

GeneGPT: augmenting large language models with domain tools for improved access to biomedical information.GeneGPT：利用领域工具增强大型语言模型，以改善对生物医学信息的访问。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae075.

Autonomous chemical research with large language models.大语言模型驱动的自主化学研究。

Nature. 2023 Dec;624(7992):570-578. doi: 10.1038/s41586-023-06792-0. Epub 2023 Dec 20.

ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls.ChatGPT、Bard 及大型语言模型在生物医学研究中的应用：机遇与挑战。

Ann Biomed Eng. 2023 Dec;51(12):2647-2651. doi: 10.1007/s10439-023-03284-0. Epub 2023 Jun 16.

Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。

Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

The eICU Collaborative Research Database, a freely available multi-center database for critical care research.eICU 协作研究数据库，一个免费的多中心重症监护研究数据库。

Sci Data. 2018 Sep 11;5:180178. doi: 10.1038/sdata.2018.178.

Electronic health records to facilitate clinical research.电子健康记录助力临床研究。

Clin Res Cardiol. 2017 Jan;106(1):1-9. doi: 10.1007/s00392-016-1025-6. Epub 2016 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验