Shi Gang, Kuang Qingmin
State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China.
Front Genet. 2021 Sep 27;12:724638. doi: 10.3389/fgene.2021.724638. eCollection 2021.
With the advance of sequencing technology, an increasing number of populations have been sequenced to study the histories of worldwide populations, including their divergence, admixtures, migration, and effective sizes. The variants detected in sequencing studies are largely rare and mostly population specific. Population-specific variants are often recent mutations and are informative for revealing substructures and admixtures in populations; however, computational methods and tools to analyze them are still lacking. In this work, we propose using reference populations and single nucleotide polymorphisms (SNPs) specific to the reference populations. Ancestral information, the best linear unbiased estimator (BLUE) of the ancestral proportion, is proposed, which can be used to infer ancestral proportions in recently admixed target populations and measure the extent to which reference populations serve as good proxies for the admixing sources. Based on the same panel of SNPs, the ancestral information is comparable across samples from different studies and is not affected by genetic outliers, related samples, or the sample sizes of the admixed target populations. In addition, ancestral spectrum is useful for detecting genetic outliers or exploring co-ancestry between study samples and the reference populations. The methods are implemented in a program, Ancestral Spectrum Analyzer (ASA), and are applied in analyzing high-coverage sequencing data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP). In the analyses of American populations from the 1000 Genomes Project, we demonstrate that recent admixtures can be dissected from ancient admixtures by comparing ancestral spectra with and without indigenous Americans being included in the reference populations.
随着测序技术的发展,越来越多的人群被测序,以研究全球人群的历史,包括他们的分化、混合、迁移和有效群体大小。测序研究中检测到的变异大多是罕见的,且大多具有群体特异性。群体特异性变异通常是近期突变,有助于揭示群体中的亚结构和混合情况;然而,分析这些变异的计算方法和工具仍然缺乏。在这项工作中,我们提出使用参考群体以及特定于参考群体的单核苷酸多态性(SNP)。我们提出了祖先信息,即祖先比例的最佳线性无偏估计(BLUE),它可用于推断近期混合的目标群体中的祖先比例,并衡量参考群体作为混合源的良好代理的程度。基于同一组SNP,不同研究样本中的祖先信息具有可比性,并且不受遗传异常值、相关样本或混合目标群体样本大小的影响。此外,祖先谱有助于检测遗传异常值或探索研究样本与参考群体之间的共同祖先关系。这些方法在一个名为祖先谱分析仪(ASA)的程序中实现,并应用于分析来自千人基因组计划和人类基因组多样性计划(HGDP)的高覆盖度测序数据。在对千人基因组计划中美国人群的分析中,我们证明,通过比较参考群体中包含和不包含美洲原住民时的祖先谱,可以将近期混合与古代混合区分开来。