Suppr超能文献

心脏:使用异构关系感知转换器学习更好的 EHR 数据表示。

HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer.

机构信息

Department of Computer Science, Yale University, United States.

Department of Computer Science, Yale University, United States.

出版信息

J Biomed Inform. 2024 Nov;159:104741. doi: 10.1016/j.jbi.2024.104741. Epub 2024 Oct 29.

Abstract

OBJECTIVE

Pretrained language models have recently demonstrated their effectiveness in modeling Electronic Health Record (EHR) data by modeling the encounters of patients as sentences. However, existing methods fall short of utilizing the inherent heterogeneous correlations between medical entities-which include diagnoses, medications, procedures, and lab tests. Existing studies either focus merely on diagnosis entities or encode different entities in a homogeneous space, leading to suboptimal performance. Motivated by this, we aim to develop a foundational language model pre-trained on EHR data with explicitly incorporating the heterogeneous correlations among these entities.

METHODS

In this study, we propose HEART, a heterogeneous relation-aware transformer for EHR. Our model includes a range of heterogeneous entities within each input sequence and represents pairwise relationships between entities as a relation embedding. Such a higher-order representation allows the model to perform complex reasoning and derive attention weights in the heterogeneous context. Additionally, a multi-level attention scheme is employed to exploit the connection between different encounters while alleviating the high computational costs. For pretraining, HEART engages with two tasks, missing entity prediction and anomaly detection, which both effectively enhance the model's performance on various downstream tasks.

RESULTS

Extensive experiments on two EHR datasets and five downstream tasks demonstrate HEART's superior performance compared to four SOTA foundation models. For instance, HEART achieves improvements of 12.1% and 4.1% over Med-BERT in death and readmission prediction, respectively. Additionally, case studies show that HEART offers interpretable insights into the relationships between entities through the learned relation embeddings.

CONCLUSION

We study the problem of EHR representation learning and propose HEART, a model that leverages the heterogeneous relationships between medical entities. Our approach includes a multi-level encoding scheme and two specialized pretrained objectives, designed to boost both the efficiency and effectiveness of the model. We have comprehensively evaluated HEART across five clinically significant downstream tasks using two EHR datasets. The experimental results verify the model's great performance and validate its practical utility in healthcare applications. Code: https://github.com/Graph-and-Geometric-Learning/HEART.

摘要

目的

最近,预训练语言模型通过将患者的就诊经历建模为句子,在对电子健康记录(EHR)数据进行建模方面表现出了有效性。然而,现有的方法未能利用医疗实体之间固有的异构相关性——这些实体包括诊断、药物、程序和实验室检查。现有研究要么仅仅关注诊断实体,要么在同质空间中对不同实体进行编码,导致性能不佳。受此启发,我们旨在开发一种基于 EHR 数据的基础语言模型,该模型明确纳入这些实体之间的异构相关性。

方法

在这项研究中,我们提出了 HEART,一种用于 EHR 的异构关系感知转换器。我们的模型在每个输入序列中包含一系列异构实体,并将实体之间的成对关系表示为关系嵌入。这种高阶表示允许模型在异构上下文中进行复杂推理并得出注意力权重。此外,还采用了多层次注意力机制来利用不同就诊之间的联系,同时减轻高计算成本。对于预训练,HEART 参与了两个任务,即缺失实体预测和异常检测,这两个任务都有效地提高了模型在各种下游任务上的性能。

结果

在两个 EHR 数据集和五个下游任务上进行的广泛实验表明,HEART 的性能优于四个 SOTA 基础模型。例如,在死亡和再入院预测方面,HEART 相对于 Med-BERT 分别提高了 12.1%和 4.1%。此外,案例研究表明,HEART 通过学习到的关系嵌入为实体之间的关系提供了可解释的见解。

结论

我们研究了 EHR 表示学习的问题,并提出了 HEART,这是一种利用医疗实体之间异构关系的模型。我们的方法包括多层次编码方案和两个专门的预训练目标,旨在提高模型的效率和效果。我们使用两个 EHR 数据集对 HEART 进行了全面评估,涵盖了五个具有临床意义的下游任务。实验结果验证了模型的出色性能,并验证了其在医疗保健应用中的实际效用。代码:https://github.com/Graph-and-Geometric-Learning/HEART。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验