Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Bioinformatics. 2011 Mar 1;27(5):670-7. doi: 10.1093/bioinformatics/btq709. Epub 2010 Dec 17.
Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus.
We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference.
混合人群为绘制疾病图谱提供了一个独特的机会,这些疾病在祖先群体之间存在较大的疾病等位基因频率差异。然而,在这种人群中进行关联分析具有挑战性,因为人群分层可能导致与疾病位点不相关的基因座发生关联。
我们表明,在测试单核苷酸多态性(SNP)处的局部遗传背景可能会混淆关联信号,而忽略它可能会导致虚假关联。我们从理论上证明,无论分层的机制如何,无论是由于研究对象之间的局部还是全局遗传背景差异,调整测试 SNP 处的局部遗传背景足以消除虚假关联;然而,全局遗传背景调整程序可能并不有效。我们进一步开发了两种新的调整局部遗传背景的关联测试。我们的第一个测试基于条件似然框架,该框架对给定疾病状态和侧翼标记基因型的测试 SNP 的分布进行建模。该测试的一个关键优势在于其能够整合祖先群体中不同的关联方向。我们的第二个测试基于逻辑回归,调整局部遗传背景比例。我们进行了广泛的模拟,发现我们的测试的Ⅰ型错误率得到了控制;然而,当分层是由于局部遗传差异引起时,全局调整程序会导致Ⅰ型错误率膨胀。