Department of Biostatistics, Johns Hopkins University Baltimore, MD, USA.
Mathematical Institute, Heinrich Heine University Düsseldorf Düsseldorf, Germany.
Front Genet. 2013 Dec 16;4:252. doi: 10.3389/fgene.2013.00252. eCollection 2013.
Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).
全基因组关联研究(GWAS)已经确定了数千个与各种特征相关的 DNA 位点。统计推断几乎总是基于关联的单一标记假设检验,以及相应的经过 Bonferroni 校正的 p 值。由于商业上可用的基因组芯片同时检测数十万甚至数百万个位点,因此许多因果但未检测到的位点被认为存在,因为达到全基因组显著性水平的条件功效可能较低,特别是对于效应大小较小、次要等位基因频率较低的标记,以及在样本量适中的研究中。然而,由于连锁不平衡(LD)导致的人类基因组中相邻标记之间的相关性会导致相关标记检验统计量,可以将其纳入多标记假设检验中,从而提高检测关联的功效。在此,我们通过量化仅当因果标记已知时,病例对照研究中关联的多标记检验可以实现的最大功效,建立了一个理论基准。利用 LD 块内的基因型相关性转化为得分检验统计量的渐近多元正态分布,我们为标记开发了一组权重,使非中心参数最大化,并评估了其他方法的相对功效损失。我们发现,Conneely 和 Boehnke(2007)基于在 LD 块中观察到的最大绝对检验统计量的方法在各种情况下都是一种实用且强大的方法。我们还探讨了用于缩小因果标记位置的先验生物学或功能知识对功效的影响,并得出结论,只有当这种先验知识非常强大且具体时,功效才能接近可达到的最大水平,甚至超过 Conneely 和 Boehnke(2007)提出的方法观察到的功效。