Wu Chengqing, DeWan Andrew, Hoh Josephine, Wang Zuoheng
Department of Epidemiology and Public Health, Yale University, New Haven, CT 06510, USA.
Ann Hum Genet. 2011 May;75(3):418-27. doi: 10.1111/j.1469-1809.2010.00639.x. Epub 2011 Jan 31.
Population stratification is an important issue in case-control studies of disease-marker association. Failure to properly account for population structure can lead to spurious association or reduced power. In this article, we compare the performance of six methods correcting for population stratification in case-control association studies. These methods include genomic control (GC), EIGENSTRAT, principal component-based logistic regression (PCA-L), LAPSTRUCT, ROADTRIPS, and EMMAX. We also include the uncorrected Armitage test for comparison. In the simulation studies, we consider a wide range of population structure models for unrelated samples, including admixture. Our simulation results suggest that PCA-L and LAPSTRUCT perform well over all the scenarios studied, whereas GC, ROADTRIPS, and EMMAX fail to correct for population structure at single nucleotide polymorphisms (SNPs) that show strong differentiation across ancestral populations. The Armitage test does not adjust for confounding due to stratification thus has inflated type I error. Among all correction methods, EMMAX has the greatest power, based on the population structure settings considered for samples with unrelated individuals. The three methods, EIGENSTRAT, PCA-L, and LAPSTRUCT, are comparable, and outperform both GC and ROADTRIPS in almost all situations.
群体分层是疾病标志物关联病例对照研究中的一个重要问题。未能恰当考虑群体结构可能导致虚假关联或检验效能降低。在本文中,我们比较了病例对照关联研究中六种校正群体分层方法的性能。这些方法包括基因组控制(GC)、EIGENSTRAT、基于主成分的逻辑回归(PCA-L)、LAPSTRUCT、ROADTRIPS和EMMAX。我们还纳入了未校正的阿米蒂奇检验以作比较。在模拟研究中,我们考虑了广泛的无关样本群体结构模型,包括混合模型。我们的模拟结果表明,PCA-L和LAPSTRUCT在所有研究场景下表现良好,而GC、ROADTRIPS和EMMAX在跨祖先群体显示出强烈分化的单核苷酸多态性(SNP)处未能校正群体结构。阿米蒂奇检验未对分层导致的混杂因素进行校正,因此I型错误率升高。在所有校正方法中,基于为无关个体样本考虑的群体结构设置,EMMAX检验效能最高。EIGENSTRAT、PCA-L和LAPSTRUCT这三种方法性能相当,并且在几乎所有情况下都优于GC和ROADTRIPS。