Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts.
Department of Neurology, Boston University School of Medicine, Boston, Massachusetts.
Genet Epidemiol. 2020 Oct;44(7):702-716. doi: 10.1002/gepi.22332. Epub 2020 Jun 30.
Population stratification may cause an inflated type-I error and spurious association when assessing the association between genetic variations with an outcome. Many genetic association studies are now using exonic variants, which captures only 1% of the genome, however, population stratification adjustments have not been evaluated in the context of exonic variants. We compare the performance of two established approaches: principal components analysis (PCA) and mixed-effects models and assess the utility of genome-wide (GW) and exonic variants, by simulation and using a data set from the Framingham Heart Study. Our results illustrate that although the PCs and genetic relationship matrices computed by GW and exonic markers are different, the type-I error rate of association tests for common variants with additive effect appear to be properly controlled in the presence of population stratification. In addition, by considering single nucleotide variants (SNVs) that have different levels of confounding by population stratification, we also compare the power across multiple association approaches to account for population stratification such as PC-based corrections and mixed-effects models. We find that while these two methods achieve a similar power for SNVs that have a low or medium level of confounding by population stratification, mixed-effects model can reach a higher power for SNVs highly confounded by population stratification.
人群分层可能导致在评估遗传变异与结果之间的关联时,出现 I 型错误膨胀和虚假关联。现在许多遗传关联研究都在使用外显子变异,它仅捕获基因组的 1%,然而,在外显子变异的背景下,尚未评估人群分层调整。我们比较了两种已建立的方法的性能:主成分分析(PCA)和混合效应模型,并通过模拟和使用弗雷明汉心脏研究的数据来评估全基因组(GW)和外显子变异的效用。我们的结果表明,尽管 GW 和外显子标记计算的 PCs 和遗传关系矩阵不同,但在存在人群分层的情况下,具有加性效应的常见变异关联测试的 I 型错误率似乎得到了适当控制。此外,通过考虑具有不同人群分层混杂程度的单核苷酸变异(SNVs),我们还比较了多种关联方法在考虑人群分层时的功效,例如基于 PCs 的校正和混合效应模型。我们发现,虽然这两种方法对于人群分层混杂程度较低或中等的 SNVs 具有相似的功效,但混合效应模型对于人群分层高度混杂的 SNVs 可以达到更高的功效。