Song Shiya, Sliwerska Elzbieta, Emery Sarah, Kidd Jeffrey M
Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan 48109.
Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109.
Genetics. 2017 Jan;205(1):385-395. doi: 10.1534/genetics.116.192963. Epub 2016 Nov 9.
Phased haplotype sequences are a key component in many population genetic analyses since variation in haplotypes reflects the action of recombination, selection, and changes in population size. In humans, haplotypes are typically estimated from unphased sequence or genotyping data using statistical models applied to large reference panels. To assess the importance of correct haplotype phase on population history inference, we performed fosmid pool sequencing and resolved phased haplotypes of five individuals from diverse African populations (including Yoruba, Esan, Gambia, Maasai, and Mende). We physically phased 98% of heterozygous SNPs into haplotype-resolved blocks, obtaining a block N50 of 1 Mbp. We combined these data with additional phased genomes from San, Mbuti, Gujarati, and Centre de'Etude du Polymorphism Humain European populations and analyzed population size and separation history using the pairwise sequentially Markovian coalescent and multiple sequentially Markovian coalescent models. We find that statistically phased haplotypes yield a more recent split-time estimation compared with experimentally phased haplotypes. To better interpret patterns of cross-population coalescence, we implemented an approximate Bayesian computation approach to estimate population split times and migration rates by fitting the distribution of coalescent times inferred between two haplotypes, one from each population, to a standard isolation-with-migration model. We inferred that the separation between hunter-gatherer populations and other populations happened ∼120-140 KYA, with gene flow continuing until 30-40 KYA; separation between west-African and out-of-African populations happened ∼70-80 KYA; while the separation between Maasai and out-of-African populations happened ∼50 KYA.
分阶段的单倍型序列是许多群体遗传分析的关键组成部分,因为单倍型的变异反映了重组、选择和种群大小变化的作用。在人类中,单倍型通常是根据未分阶段的序列或基因分型数据,使用应用于大型参考面板的统计模型来估计的。为了评估正确的单倍型阶段对种群历史推断的重要性,我们进行了fosmid文库测序,并解析了来自不同非洲群体(包括约鲁巴、埃桑、冈比亚、马赛和门德)的五个人的分阶段单倍型。我们将98%的杂合SNP物理分阶段到单倍型解析的区块中,获得了1 Mbp的区块N50。我们将这些数据与来自桑人、姆布蒂人、古吉拉特人和人类多态性研究中心欧洲群体的其他分阶段基因组相结合,并使用成对顺序马尔可夫合并和多重顺序马尔可夫合并模型分析种群大小和分离历史。我们发现,与实验分阶段的单倍型相比,统计分阶段的单倍型产生了更近的分裂时间估计。为了更好地解释跨群体合并的模式,我们实施了一种近似贝叶斯计算方法,通过将从两个群体中各一个单倍型推断出的合并时间分布拟合到标准的隔离迁移模型,来估计种群分裂时间和迁移率。我们推断,狩猎采集群体与其他群体之间的分离发生在约12万至14万年前,基因流动一直持续到3万至4万年前;西非群体与非洲以外群体之间的分离发生在约7万至8万年前;而马赛群体与非洲以外群体之间的分离发生在约5万年前。