School of Biosciences, University of Birmingham, Birmingham, United Kingdom.
PLoS One. 2011;6(8):e23192. doi: 10.1371/journal.pone.0023192. Epub 2011 Aug 9.
It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.
We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.
RESULTS/CONCLUSIONS: The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.
最近,全基因组关联研究(GWAS)的理论核心已经得到充分证实,即统计推断测试遗传标记与假定的疾病性状相关的基因座之间的连锁不平衡(LD)。然而,LD 分析容易受到多种混杂因素的影响,其中最突出的是群体分层。虽然已经提出了许多方法来纠正这种影响,要么通过预测结构参数,要么通过纠正由于分层而导致的检验统计量的膨胀,但这些方法在实际实施中可能不可行或可能带来进一步的统计问题。
我们在这里提出了一种新的统计方法,通过将控制标记纳入检测多态标记与复杂性状表型变异的遗传关联的显著性中来控制群体结构中的虚假 LD。该方法避免了结构预测的需要,这在实践中可能不可行或不充分,并且正确考虑了群体分层对研究基因组不同区域的不同影响。通过密集的计算机模拟研究和遗传分化人群中表达数量性状基因座的关联全基因组图谱,测试了新方法的实用性和统计特性。
结果/结论:分析表明,与 GWAS 文献中两种常用方法相比,新方法在子群体中检测真实遗传关联的统计能力得到了提高,并有效控制了由群体结构引起的虚假关联。