Qin Huaizhen, Zhu Xiaofeng
Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, 70112, USA.
Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA.
Methods Mol Biol. 2017;1666:441-453. doi: 10.1007/978-1-4939-7274-6_21.
In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and must be inferred. For recently admixed populations with known reference ancestral populations, locus-specific ancestries can be inferred accurately using some hidden Markov model-based methods. However, SNP-wise ancestries cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PCs) to present local ancestries and adjusting for local PCs when testing for gene-phenotype association.
在基因关联研究中,有必要对群体结构进行校正,以避免推断偏差。在过去十年中,普遍的校正方法通常只涉及对抽样个体之间全球祖先差异的调整。然而,由于与自然选择、迁移或随机遗传漂变相关的局部祖先的变异性,群体结构可能在局部基因组区域有所不同。当局部群体结构是一个重要的混杂因素时,仅调整全球祖先可能并不充分。相比之下,调整局部祖先可以更有效地防止由于局部群体结构导致的假阳性。为了更准确地定位疾病基因,我们建议通过询问局部结构来调整局部祖先。在实践中,特定基因座的祖先通常是未知的,必须进行推断。对于具有已知参考祖先群体的近期混合群体,可以使用一些基于隐马尔可夫模型的方法准确推断特定基因座的祖先。然而,当没有祖先群体信息时,单核苷酸多态性(SNP)层面的祖先无法准确推断。对于这种情况,我们建议采用局部主成分(PC)来呈现局部祖先,并在测试基因-表型关联时对局部PC进行调整。