Suppr超能文献

使用非平稳核逼近的电子健康记录风险预测的时间自注意力。

Temporal self-attention for risk prediction from electronic health records using non-stationary kernel approximation.

机构信息

AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar.

College of Engineering, Qatar University, Qatar.

出版信息

Artif Intell Med. 2024 Mar;149:102802. doi: 10.1016/j.artmed.2024.102802. Epub 2024 Feb 10.

Abstract

Effective modeling of patient representation from electronic health records (EHRs) is increasingly becoming a vital research topic. Yet, modeling the non-stationarity in EHR data has received less attention. Most existing studies follow a strong assumption of stationarity in patient representation from EHRs. However, in practice, a patient's visits are irregularly spaced over a relatively long period of time, and disease progression patterns exhibit non-stationarity. Furthermore, the time gaps between patient visits often encapsulate significant domain knowledge, potentially revealing undiscovered patterns that characterize specific medical conditions. To address these challenges, we introduce a new method which combines the self-attention mechanism with non-stationary kernel approximation to capture both contextual information and temporal relationships between patient visits in EHRs. To assess the effectiveness of our proposed approach, we use two real-world EHR datasets, comprising a total of 76,925 patients, for the task of predicting the next diagnosis code for a patient, given their EHR history. The first dataset is a general EHR cohort and consists of 11,451 patients with a total of 3,485 unique diagnosis codes. The second dataset is a disease-specific cohort that includes 65,474 pregnant patients and encompasses a total of 9,782 unique diagnosis codes. Our experimental evaluation involved nine prediction models, categorized into three distinct groups. Group 1 comprises the baselines: original self-attention with positional encoding model, RETAIN model, and LSTM model. Group 2 includes models employing self-attention with stationary kernel approximations, specifically incorporating three variations of Bochner's feature maps. Lastly, Group 3 consists of models utilizing self-attention with non-stationary kernel approximations, including quadratic, cubic, and bi-quadratic polynomials. The experimental results demonstrate that non-stationary kernels significantly outperformed baseline methods for NDCG@10 and Hit@10 metrics in both datasets. The performance boost was more substantial in dataset 1 for the NDCG@10 metric. On the other hand, stationary Kernels showed significant but smaller gains over baselines and were nearly as effective as Non-stationary Kernels for Hit@10 in dataset 2. These findings robustly validate the efficacy of employing non-stationary kernels for temporal modeling of EHR data, and emphasize the importance of modeling non-stationary temporal information in healthcare prediction tasks.

摘要

从电子健康记录(EHR)中有效建模患者表示越来越成为一个重要的研究课题。然而,EHR 数据中的非平稳性建模受到的关注较少。大多数现有研究都遵循 EHR 中患者表示的平稳性强假设。然而,在实践中,患者的就诊时间间隔不规则,且疾病进展模式表现出非平稳性。此外,患者就诊之间的时间间隔通常包含重要的领域知识,可能揭示出特定医疗条件的未被发现的模式。为了解决这些挑战,我们引入了一种新方法,该方法结合了自注意力机制和非平稳核逼近,以捕获 EHR 中患者就诊之间的上下文信息和时间关系。为了评估我们提出的方法的有效性,我们使用了两个真实的 EHR 数据集,总共包含 76925 名患者,用于预测患者的下一个诊断代码,给定他们的 EHR 历史。第一个数据集是一个普通的 EHR 队列,包含 11451 名患者,共有 3485 个独特的诊断代码。第二个数据集是一个特定疾病的队列,包含 65474 名孕妇,共有 9782 个独特的诊断代码。我们的实验评估涉及九个预测模型,分为三个不同的组。第 1 组包括基线:带有位置编码模型的原始自注意力模型、RETAIN 模型和 LSTM 模型。第 2 组包括使用带有平稳核逼近的自注意力的模型,具体包括 Bochner 特征图的三种变体。最后,第 3 组包括使用带有非平稳核逼近的自注意力的模型,包括二次、三次和双二次多项式。实验结果表明,在两个数据集的 NDCG@10 和 Hit@10 指标上,非平稳核显著优于基线方法。对于 NDCG@10 指标,数据集 1 的性能提升更为显著。另一方面,对于 Hit@10,平稳核相对于基线有显著但较小的增益,并且在数据集 2 中与非平稳核一样有效。这些发现有力地验证了在 EHR 数据的时间建模中使用非平稳核的有效性,并强调了在医疗保健预测任务中建模非平稳时间信息的重要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验