Barnett Ian, Mukherjee Rajarshi, Lin Xihong
Department of Biostatistics, Harvard School of Public Health, Boston, MA.
Department of Statistics, Stanford University, Stanford, CA.
J Am Stat Assoc. 2017;112(517):64-76. doi: 10.1080/01621459.2016.1192039. Epub 2017 May 3.
It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic -value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online.
研究基因、遗传通路和网络对复杂疾病风险的影响具有重大意义。这些遗传结构各自包含多个单核苷酸多态性(SNP),它们通常相互关联且共同发挥作用,数量可能众多。然而,在一个遗传结构中,通常只有一小部分稀疏的SNP与感兴趣的疾病相关。在本文中,我们提出广义更高批评(GHC)方法来检验SNP集与疾病结局之间的关联。更高批评是一种传统上用于高维信号检测的检验方法,其前提是边际检验统计量相互独立且参数数量非常大。然而,由于SNP之间的连锁不平衡以及每个遗传结构中SNP集的SNP数量有限,这些假设在基因关联研究中并不总是成立。所提出的GHC克服了更高批评的局限性,它允许SNP集中的SNP之间存在任意的相关结构,同时能对SNP集中任意有限数量的SNP进行精确的p值计算。我们得到了GHC检验的检测边界。我们通过模拟在一系列具有不同相关结构和信号稀疏性的遗传区域上,将GHC方法的功效与现有的SNP集检验方法进行了实证比较。我们应用所提出的方法分析了CGEM乳腺癌全基因组关联研究。本文的补充材料可在网上获取。