Cai T Tony, Li Hongzhe, Liu Weidong, Xie Jichun
Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA.
Biometrika. 2013 Mar;100(1):139-156. doi: 10.1093/biomet/ass058. Epub 2012 Nov 30.
Motivated by analysis of genetical genomics data, we introduce a sparse high dimensional multivariate regression model for studying conditional independence relationships among a set of genes adjusting for possible genetic effects. The precision matrix in the model specifies a covariate-adjusted Gaussian graph, which presents the conditional dependence structure of gene expression after the confounding genetic effects on gene expression are taken into account. We present a covariate-adjusted precision matrix estimation method using a constrained ℓ minimization, which can be easily implemented by linear programming. Asymptotic convergence rates in various matrix norms and sign consistency are established for the estimators of the regression coefficients and the precision matrix, allowing both the number of genes and the number of the genetic variants to diverge. Simulation shows that the proposed method results in significant improvements in both precision matrix estimation and graphical structure selection when compared to the standard Gaussian graphical model assuming constant means. The proposed method is also applied to analyze a yeast genetical genomics data for the identification of the gene network among a set of genes in the mitogen-activated protein kinase pathway.
受遗传基因组学数据分析的启发,我们引入了一种稀疏高维多元回归模型,用于研究一组基因之间的条件独立关系,并对可能的遗传效应进行调整。模型中的精度矩阵指定了一个协变量调整后的高斯图,它呈现了在考虑基因表达上的混杂遗传效应后基因表达的条件依赖结构。我们提出了一种使用约束ℓ最小化的协变量调整精度矩阵估计方法,该方法可以通过线性规划轻松实现。为回归系数和精度矩阵的估计量建立了各种矩阵范数下的渐近收敛速度和符号一致性,允许基因数量和遗传变异数量都发散。模拟表明,与假设均值恒定的标准高斯图形模型相比,所提出的方法在精度矩阵估计和图形结构选择方面都有显著改进。所提出的方法还应用于分析酵母遗传基因组学数据,以识别丝裂原活化蛋白激酶途径中一组基因之间的基因网络。