Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Department of Healthcare Policy & Research, Weill Cornell Medicine, Cornell University New York, NY, USA.
Bioinformatics. 2019 Apr 15;35(8):1395-1403. doi: 10.1093/bioinformatics/bty804.
Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features.
In this article, we present a hybrid non-negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real-world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities.
Our code is publicly available on github at https://github.com/yuanluo/hnmf.
Supplementary data are available at Bioinformatics online.
高血压是一种需要改进亚型划分的异质综合征,需要使用表型和遗传测量来实现,目的是识别具有相似病理生理机制的患者亚组,这些患者可能对靶向治疗更有反应。现有的机器学习方法通常在整合表型和基因型信息以及向临床医生提供可解释的模型方面面临挑战。我们旨在基于表型和基因型特征为患者提供信息丰富的分层。
在本文中,我们提出了一种混合非负矩阵分解(HNMF)方法,用于整合表型和基因型信息进行患者分层。HNMF 使用不同的适当损失函数同时近似表型和遗传特征矩阵,并生成患者亚型、表型组和遗传组。与以前的方法不同,HNMF 在 Frobenius 损失下近似表型矩阵,在 Kullback-Leibler(KL)损失下近似遗传矩阵。我们提出了一种交替投影梯度方法来解决逼近问题。模拟表明,HNMF 能够快速准确地逼近真实因子矩阵。在一个真实的临床数据集上,我们使用患者因子矩阵作为特征,并检查这些特征与心脏力学指标的关联。我们将 HNMF 与六种不同的模型进行了比较,这些模型单独使用表型或基因型特征,或者使用没有 NMF 的特征,或者使用仅有一种损失的联合 NMF。我们还将 HNMF 与最近发表的三种用于综合聚类分析的方法进行了比较,包括 iClusterBayes、贝叶斯联合分析和 JIVE。HNMF 明显优于所有比较模型。HNMF 还揭示了直观的表型-基因型相互作用,这些相互作用可以描述心脏异常。
我们的代码可在 github 上公开获得,网址为 https://github.com/yuanluo/hnmf。
补充数据可在生物信息学在线获得。