Yin Jianxin, Li Hongzhe
University of Pennsylvania School of Medicine.
Ann Appl Stat. 2011 Dec;5(4):2630-2650. doi: 10.1214/11-AOAS494.
Genetical genomics experiments have now been routinely conducted to measure both the genetic markers and gene expression data on the same subjects. The gene expression levels are often treated as quantitative traits and are subject to standard genetic analysis in order to identify the gene expression quantitative loci (eQTL). However, the genetic architecture for many gene expressions may be complex, and poorly estimated genetic architecture may compromise the inferences of the dependency structures of the genes at the transcriptional level. In this paper, we introduce a sparse conditional Gaussian graphical model for studying the conditional independent relationships among a set of gene expressions adjusting for possible genetic effects where the gene expressions are modeled with seemingly unrelated regressions. We present an efficient coordinate descent algorithm to obtain the penalized estimation of both the regression coefficients and sparse concentration matrix. The corresponding graph can be used to determine the conditional independence among a group of genes while adjusting for shared genetic effects. Simulation experiments and asymptotic convergence rates and sparsistency are used to justify our proposed methods. By sparsistency, we mean the property that all parameters that are zero are actually estimated as zero with probability tending to one. We apply our methods to the analysis of a yeast eQTL data set and demonstrate that the conditional Gaussian graphical model leads to more interpretable gene network than standard Gaussian graphical model based on gene expression data alone.
遗传基因组学实验现在已经常规进行,以测量同一受试者的遗传标记和基因表达数据。基因表达水平通常被视为数量性状,并进行标准的遗传分析,以识别基因表达数量性状位点(eQTL)。然而,许多基因表达的遗传结构可能很复杂,估计不佳的遗传结构可能会损害在转录水平上对基因依赖结构的推断。在本文中,我们引入了一种稀疏条件高斯图形模型,用于研究一组基因表达之间的条件独立关系,同时调整可能的遗传效应,其中基因表达采用看似不相关回归进行建模。我们提出了一种有效的坐标下降算法,以获得回归系数和稀疏浓度矩阵的惩罚估计。相应的图形可用于确定一组基因之间的条件独立性,同时调整共享的遗传效应。模拟实验、渐近收敛率和稀疏一致性被用来证明我们提出的方法。所谓稀疏一致性,是指所有为零的参数实际上以趋于1的概率被估计为零的性质。我们将我们的方法应用于酵母eQTL数据集的分析,并证明条件高斯图形模型比仅基于基因表达数据的标准高斯图形模型能产生更具可解释性的基因网络。