Department of Biostatistics, University of Washington, Seattle, 98195, USA.
Am J Hum Genet. 2012 Jul 13;91(1):122-38. doi: 10.1016/j.ajhg.2012.05.024. Epub 2012 Jun 28.
Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified.
全基因组关联研究(GWAS)常用于遗传基因座的映射,这些基因座影响复杂性状。在基于人群和基于家族的 GWAS 中经常遇到的一个问题是识别隐性亲缘关系和群体分层,因为众所周知,不适当考虑系谱和人口结构都可能导致虚假关联。已经提出了许多方法来识别同质人群样本中的亲属。然而,对群体同质性的强烈假设通常是站不住脚的,许多 GWAS 包括来自结构人群的样本。在这里,我们考虑在具有混合血统的结构人群中估计亲缘关系的问题。我们提出了一种方法,即 REAP(混合人群中的亲缘关系估计),用于在混合人群中稳健估计血缘关系(IBD)共享概率和亲缘系数。REAP 通过使用基于全基因组分析得出的祖先衍生的个体特异性 SNP 等位基因频率来适当考虑人口结构和与祖先相关的交配选择,从而正确地估计人口结构和与祖先相关的交配选择。在与相关个体和高度分化群体的混合模拟研究中,我们证明 REAP 可以给出准确的 IBD 共享概率和亲缘系数。我们将 REAP 应用于 Phase III 国际单体型图谱项目第三版的加利福尼亚州洛杉矶的墨西哥裔美国人(MXL)人群样本;在该样本中,我们确定了以前未报告的第三和第四代亲属。我们还将 REAP 应用于妇女健康倡议 SNP 健康关联资源(WHI-SHARe)研究中的非裔美国人和西班牙裔样本,其中已经确定了数百对隐性相关的个体。