Afshar Ardavan, Perros Ioakeim, Park Haesun, deFilippi Christopher, Yan Xiaowei, Stewart Walter, Ho Joyce, Sun Jimeng
Georgia Institute of Technology.
HEALTH[at]SCALE.
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.
focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose emporal nd tatic nsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.
专注于定义有意义的患者群体(例如,心力衰竭组和糖尿病组)并识别这些群体中患者的时间演变。张量分解一直是用于表型分析的有效工具。大多数现有工作要么假设具有聚合数据的静态患者表示,要么仅对时间数据进行建模。然而,真实的电子健康记录(EHR)数据包含时间信息(例如,纵向临床就诊)和静态信息(例如,患者人口统计学信息),这很难同时进行建模。在本文中,我们提出了时间与静态张量分解(TASTE)方法,该方法联合对静态和时间信息进行建模以提取表型。TASTE将PARAFAC2模型与非负矩阵分解相结合,以对时间张量和静态张量进行建模。为了拟合所提出的模型,我们将原始问题转化为更简单的问题,并以交替方式对其进行最优求解。对于每个子问题,我们提出的数学重新表述方法会产生高效的子问题求解器。对来自心力衰竭(HF)研究的大型EHR数据进行的综合实验证实,TASTE比几个基线方法快14倍,并且心脏病专家确认所得到的表型具有临床意义。使用TASTE提取的60种表型,与使用具有345个特征的递归神经网络(RNN)的深度学习模型相比,简单的逻辑回归在预测HF时可以达到相同的曲线下面积(AUC)水平。