电子健康记录中信息检索的经验教训：嵌入模型与池化策略的比较

Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies.

作者信息

Myers Skatje, Miller Timothy A, Gao Yanjun, Churpek Matthew M, Mayampurath Anoop, Dligach Dmitriy, Afshar Majid

机构信息

Department of Medicine, University of Wisconsin-Madison, Madison, WI 53726, United States.

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, United States.

出版信息

J Am Med Inform Assoc. 2025 Feb 1;32(2):357-364. doi: 10.1093/jamia/ocae308.

DOI:10.1093/jamia/ocae308

PMID:39703187

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11756698/

Abstract

OBJECTIVES

Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone. This paper presents an ablation study exploring how different embedding models and pooling methods affect information retrieval for the clinical domain.

MATERIALS AND METHODS

Evaluating on 3 retrieval tasks on 2 electronic health record (EHR) data sources, we compared 7 models, including medical- and general-domain models, specialized encoder embedding models, and off-the-shelf decoder LLMs. We also examine the choice of embedding pooling strategy for each model, independently on the query and the text to retrieve.

RESULTS

We found that the choice of embedding model significantly impacts retrieval performance, with BGE, a comparatively small general-domain model, consistently outperforming all others, including medical-specific models. However, our findings also revealed substantial variability across datasets and query text phrasings. We also determined the best pooling methods for each of these models to guide future design of retrieval systems.

DISCUSSION

The choice of embedding model, pooling strategy, and query formulation can significantly impact retrieval performance and the performance of these models on other public benchmarks does not necessarily transfer to new domains. The high variability in performance across different query phrasings suggests that the choice of query may need to be tuned and validated for each task, or even for each institution's EHR.

CONCLUSION

This study provides empirical evidence to guide the selection of models and pooling strategies for RAG frameworks in healthcare applications. Further studies such as this one are vital for guiding empirically-grounded development of retrieval frameworks, such as in the context of RAG, for the clinical domain.

摘要

目标

由于处理医疗记录需要大量上下文信息，将大语言模型（LLMs）应用于临床领域具有挑战性。检索增强生成（RAG）通过促进对大型文本源的推理提供了一种解决方案。然而，仅在检索系统中就有许多参数需要优化。本文提出了一项消融研究，探讨不同的嵌入模型和池化方法如何影响临床领域的信息检索。

材料与方法

在2个电子健康记录（EHR）数据源上的3个检索任务上进行评估，我们比较了7种模型，包括医学领域和通用领域模型、专门的编码器嵌入模型以及现成的解码器LLMs。我们还独立地针对查询和要检索的文本，研究了每个模型的嵌入池化策略选择。

结果

我们发现嵌入模型的选择对检索性能有显著影响，相对较小的通用领域模型BGE始终优于所有其他模型，包括医学专用模型。然而，我们的研究结果也揭示了不同数据集和查询文本措辞之间存在很大差异。我们还确定了这些模型各自的最佳池化方法，以指导未来检索系统的设计。

讨论

嵌入模型、池化策略和查询公式的选择会显著影响检索性能，并且这些模型在其他公共基准上的性能不一定能转移到新领域。不同查询措辞的性能差异很大，这表明可能需要针对每个任务甚至每个机构的EHR对查询选择进行调整和验证。

结论

本研究提供了实证证据，以指导医疗保健应用中RAG框架的模型和池化策略选择。此类进一步研究对于指导基于实证的检索框架开发至关重要，例如在临床领域的RAG背景下。

相似文献

Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies.电子健康记录中信息检索的经验教训：嵌入模型与池化策略的比较

J Am Med Inform Assoc. 2025 Feb 1;32(2):357-364. doi: 10.1093/jamia/ocae308.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理：一项网络荟萃分析。

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益：系统评价和网络荟萃分析。

Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.

Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

Interventions to reduce harm from continued tobacco use.减少持续吸烟危害的干预措施。

Cochrane Database Syst Rev. 2016 Oct 13;10(10):CD005231. doi: 10.1002/14651858.CD005231.pub3.

Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。

Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.

A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试，采用了适配的大语言模型。

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Gender differences in the context of interventions for improving health literacy in migrants: a qualitative evidence synthesis.移民健康素养提升干预措施背景下的性别差异：一项定性证据综合分析

Cochrane Database Syst Rev. 2024 Dec 12;12(12):CD013302. doi: 10.1002/14651858.CD013302.pub2.

本文引用的文献

Call me Dr Ishmael: trends in electronic health record notes available at emergency department visits and admissions.叫我以实玛利医生：急诊科就诊和住院时电子健康记录笔记的趋势

JAMIA Open. 2024 May 22;7(2):ooae039. doi: 10.1093/jamiaopen/ooae039. eCollection 2024 Jul.

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.BioLORD-2023：融合大型语言模型和临床知识图谱洞察的语义文本表示。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1844-1855. doi: 10.1093/jamia/ocae029.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.评估临床自然语言处理系统的性能：一种评估方法的开发

JMIR Med Inform. 2021 Jul 23;9(7):e20492. doi: 10.2196/20492.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验