Xia Yin, Cai Tianxi, Cai T Tony
Department of Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, USA.
Department of Biostatistics, Harvard School of Public Health, Harvard University, Boston, Massachusetts 02115, USA.
Biometrika. 2015 Jun;102(2):247-266. doi: 10.1093/biomet/asu074. Epub 2015 Mar 2.
Model organisms and human studies have led to increasing empirical evidence that interactions among genes contribute broadly to genetic variation of complex traits. In the presence of gene-by-gene interactions, the dimensionality of the feature space becomes extremely high relative to the sample size. This imposes a significant methodological challenge in identifying gene-by-gene interactions. In the present paper, through a Gaussian graphical model framework, we translate the problem of identifying gene-by-gene interactions associated with a binary trait into an inference problem on the difference of two high-dimensional precision matrices, which summarize the conditional dependence network structures of the genes. We propose a procedure for testing the differential network globally that is particularly powerful against sparse alternatives. In addition, a multiple testing procedure with false discovery rate control is developed to infer the specific structure of the differential network. Theoretical justification is provided to ensure the validity of the proposed tests and optimality results are derived under sparsity assumptions. A simulation study demonstrates that the proposed tests maintain the desired error rates under the null and have good power under the alternative. The methods are applied to a breast cancer gene expression study.
模式生物和人类研究已产生越来越多的经验证据,表明基因间的相互作用在很大程度上促成了复杂性状的遗传变异。在存在基因与基因相互作用的情况下,相对于样本量而言,特征空间的维度变得极高。这在识别基因与基因的相互作用方面带来了重大的方法学挑战。在本文中,通过高斯图形模型框架,我们将识别与二元性状相关的基因与基因相互作用的问题转化为关于两个高维精度矩阵差异的推断问题,这两个矩阵总结了基因的条件依赖网络结构。我们提出了一种用于全局检验差异网络的程序,该程序对稀疏替代方案特别有效。此外,还开发了一种具有错误发现率控制的多重检验程序,以推断差异网络的具体结构。提供了理论依据以确保所提出检验的有效性,并在稀疏性假设下得出最优结果。一项模拟研究表明,所提出的检验在原假设下保持了所需的错误率,在备择假设下具有良好的功效。这些方法被应用于一项乳腺癌基因表达研究。