Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA; Center for Statistical Genetics, Department of Biostatistics, University of Michigan, School of Public Health, Ann Arbor, MI 48109, USA.
Am J Hum Genet. 2013 Nov 7;93(5):891-9. doi: 10.1016/j.ajhg.2013.10.008.
Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina's Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals.
混合个体中特定染色体区域的祖先估计对于人类进化历史的研究和遗传关联研究非常有用。以前,这种祖先推断依赖于全基因组关联研究 (GWAS) 阵列的高质量基因型。当样本进行外显子组测序时,这些高质量的基因型并不总是可用的,并且外显子组测序是许多正在进行的遗传研究的首选策略。在这里,我们表明,在外显子组测序实验中产生的脱靶读数可以与靶读数结合使用,以准确估计混合个体中每个染色体片段的祖先。为了重建局部祖先,我们的方法 SEQMIX 直接对对齐的碱基进行建模,而不是依赖于硬基因型调用。我们通过模拟和对 1000 基因组计划和 NHLBI 大机遇外显子组测序计划测序的样本进行分析来评估我们方法的准确性。在非裔美国人中,我们表明,我们的方法得出的局部祖先估计值与 Illumina 的 Omni 2.5M 基因分型阵列得出的估计值非常相似,并且与仅使用外显子基因型并忽略脱靶测序读数的估计值相比有了很大的改进。实现该方法的软件 SEQMIX 可以应用于人类群体历史分析,也可以用于混合个体的遗传关联研究。