TASTE：用于电子健康记录表型分析的时间和静态张量分解

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.

作者信息

Afshar Ardavan, Perros Ioakeim, Park Haesun, deFilippi Christopher, Yan Xiaowei, Stewart Walter, Ho Joyce, Sun Jimeng

机构信息

Georgia Institute of Technology.

HEALTH[at]SCALE.

出版信息

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.

DOI:10.1145/3368555.3384464

PMID:33659966

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7924914/

Abstract

focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose emporal nd tatic nsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.

摘要

专注于定义有意义的患者群体（例如，心力衰竭组和糖尿病组）并识别这些群体中患者的时间演变。张量分解一直是用于表型分析的有效工具。大多数现有工作要么假设具有聚合数据的静态患者表示，要么仅对时间数据进行建模。然而，真实的电子健康记录（EHR）数据包含时间信息（例如，纵向临床就诊）和静态信息（例如，患者人口统计学信息），这很难同时进行建模。在本文中，我们提出了时间与静态张量分解（TASTE）方法，该方法联合对静态和时间信息进行建模以提取表型。TASTE将PARAFAC2模型与非负矩阵分解相结合，以对时间张量和静态张量进行建模。为了拟合所提出的模型，我们将原始问题转化为更简单的问题，并以交替方式对其进行最优求解。对于每个子问题，我们提出的数学重新表述方法会产生高效的子问题求解器。对来自心力衰竭（HF）研究的大型EHR数据进行的综合实验证实，TASTE比几个基线方法快14倍，并且心脏病专家确认所得到的表型具有临床意义。使用TASTE提取的60种表型，与使用具有345个特征的递归神经网络（RNN）的深度学习模型相比，简单的逻辑回归在预测HF时可以达到相同的曲线下面积（AUC）水平。

相似文献

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.TASTE：用于电子健康记录表型分析的时间和静态张量分解

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.

Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.通过 PARAFAC2 张量分解对医学上复杂的儿童进行时间表型分析。

J Biomed Inform. 2019 May;93:103125. doi: 10.1016/j.jbi.2019.103125. Epub 2019 Feb 8.

Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.用于分散式医疗网络的通信高效张量分解

Proc IEEE Int Conf Data Min. 2021 Dec;2021:1216-1221. doi: 10.1109/icdm51629.2021.00147. Epub 2022 Jan 24.

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.LogPar：用于处理带有缺失值的时态二元数据的逻辑PARAFAC2分解

KDD. 2020 Aug;2020:1625-1635. doi: 10.1145/3394486.3403213.

Limestone: high-throughput candidate phenotype generation via tensor factorization.石灰岩：通过张量分解进行高通量候选表型生成。

J Biomed Inform. 2014 Dec;52:199-211. doi: 10.1016/j.jbi.2014.07.001. Epub 2014 Jul 16.

Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.鲁比克：用于健康数据分析的知识引导张量分解与补全

KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.SUSTain：张量的可扩展无监督评分及其在表型分析中的应用。

KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.

COPA: Constrained PARAFAC2 for Sparse & Large Datasets.COPA：用于稀疏和大型数据集的约束PARAFAC2

Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.

Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.用于协作式健康数据分析的通信高效联邦广义张量分解

Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.

Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis.用于协作式健康数据分析的隐私保护张量分解

Proc ACM Int Conf Inf Knowl Manag. 2019 Nov;2019:1291-1300. doi: 10.1145/3357384.3357878.

引用本文的文献

Longitudinal Metabolomics Data Analysis Informed by Mechanistic Models.基于机理模型的纵向代谢组学数据分析

Metabolites. 2024 Dec 24;15(1):2. doi: 10.3390/metabo15010002.

MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning for Computational Phenotyping.MULTIPAR：用于计算表型分析的多任务学习监督不规则张量分解

Proc Mach Learn Res. 2023 Dec;225:498-511.

Creating High-Quality Synthetic Health Data: Framework for Model Development and Validation.创建高质量合成健康数据：模型开发与验证框架。

JMIR Form Res. 2024 Apr 22;8:e53241. doi: 10.2196/53241.

Improving Diagnostics with Deep Forest Applied to Electronic Health Records.深度学习森林在电子健康记录中的应用提高诊断能力。

Sensors (Basel). 2023 Jul 21;23(14):6571. doi: 10.3390/s23146571.

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.LogPar：用于处理带有缺失值的时态二元数据的逻辑PARAFAC2分解

KDD. 2020 Aug;2020:1625-1635. doi: 10.1145/3394486.3403213.

Untangling the complexity of multimorbidity with machine learning.运用机器学习厘清多种共病的复杂性。

Mech Ageing Dev. 2020 Sep;190:111325. doi: 10.1016/j.mad.2020.111325. Epub 2020 Aug 6.

本文引用的文献

DDL: Deep Dictionary Learning for Predictive Phenotyping.DDL：用于预测性表型分析的深度字典学习

IJCAI (U S). 2019 Aug;2019:5857-5863. doi: 10.24963/ijcai.2019/812.

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.SUSTain：张量的可扩展无监督评分及其在表型分析中的应用。

KDD. 2018 Jul;2018:2080-2089. doi: 10.1145/3219819.3219999.

COPA: Constrained PARAFAC2 for Sparse & Large Datasets.COPA：用于稀疏和大型数据集的约束PARAFAC2

Proc ACM Int Conf Inf Knowl Manag. 2018 Oct;2018:793-802. doi: 10.1145/3269206.3271775.

Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.基于电子健康记录的张量分解检测时变表型主题：心血管疾病案例研究。

J Biomed Inform. 2019 Oct;98:103270. doi: 10.1016/j.jbi.2019.103270. Epub 2019 Aug 22.

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization.S3CMTF：一种快速、准确且可扩展的不完全耦合矩阵-张量分解方法。

PLoS One. 2019 Jun 28;14(6):e0217316. doi: 10.1371/journal.pone.0217316. eCollection 2019.

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).利用非负矩阵分解的主题建模来识别遗传变异与疾病表型之间的关系：脂蛋白(a)（LPA）的案例研究。

PLoS One. 2019 Feb 13;14(2):e0212112. doi: 10.1371/journal.pone.0212112. eCollection 2019.

Unsupervised Discovery of Demixed, Low-Dimensional Neural Dynamics across Multiple Timescales through Tensor Component Analysis.通过张量成分分析，在多个时间尺度上对混合的、低维神经动力学进行无监督发现。

Neuron. 2018 Jun 27;98(6):1099-1115.e8. doi: 10.1016/j.neuron.2018.05.015. Epub 2018 Jun 7.

Using recurrent neural network models for early detection of heart failure onset.使用循环神经网络模型进行心力衰竭发作的早期检测。

J Am Med Inform Assoc. 2017 Mar 1;24(2):361-370. doi: 10.1093/jamia/ocw112.

Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods.特定国家网络中的临床表型分析：证明对高通量、便携式和计算方法的需求。

Artif Intell Med. 2016 Jul;71:57-61. doi: 10.1016/j.artmed.2016.05.005. Epub 2016 Jun 25.

Limestone: high-throughput candidate phenotype generation via tensor factorization.石灰岩：通过张量分解进行高通量候选表型生成。

J Biomed Inform. 2014 Dec;52:199-211. doi: 10.1016/j.jbi.2014.07.001. Epub 2014 Jul 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。