Saegusa Takumi, Shojaie Ali
Department of Mathematics, University of Maryland, College Park, MD 20742 USA.
Department of Biostatistics, University of Washington, Seattle, WA 98195 USA.
Electron J Stat. 2016;10(1):1341-1392. doi: 10.1214/16-EJS1137. Epub 2016 May 31.
We introduce a general framework for estimation of inverse covariance, or precision, matrices from heterogeneous populations. The proposed framework uses a Laplacian shrinkage penalty to encourage similarity among estimates from disparate, but related, subpopulations, while allowing for differences among matrices. We propose an efficient alternating direction method of multipliers (ADMM) algorithm for parameter estimation, as well as its extension for faster computation in high dimensions by thresholding the empirical covariance matrix to identify the joint block diagonal structure in the estimated precision matrices. We establish both variable selection and norm consistency of the proposed estimator for distributions with exponential or polynomial tails. Further, to extend the applicability of the method to the settings with unknown populations structure, we propose a Laplacian penalty based on hierarchical clustering, and discuss conditions under which this data-driven choice results in consistent estimation of precision matrices in heterogenous populations. Extensive numerical studies and applications to gene expression data from subtypes of cancer with distinct clinical outcomes indicate the potential advantages of the proposed method over existing approaches.
我们介绍了一种用于从异质总体中估计逆协方差矩阵或精度矩阵的通用框架。所提出的框架使用拉普拉斯收缩惩罚来促进来自不同但相关的子总体的估计之间的相似性,同时允许矩阵之间存在差异。我们提出了一种用于参数估计的高效交替方向乘子法(ADMM)算法,以及通过对经验协方差矩阵进行阈值处理以识别估计精度矩阵中的联合块对角结构来在高维中进行更快计算的扩展方法。我们为具有指数或多项式尾部的分布建立了所提出估计器的变量选择和范数一致性。此外,为了将该方法的适用性扩展到总体结构未知的情况,我们提出了一种基于层次聚类的拉普拉斯惩罚,并讨论了在哪些条件下这种数据驱动的选择会导致在异质总体中对精度矩阵进行一致估计。广泛的数值研究以及对具有不同临床结果的癌症亚型的基因表达数据的应用表明,所提出的方法相对于现有方法具有潜在优势。