Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, China.
Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, CAS, Shanghai, China.
Heredity (Edinb). 2018 Jul;121(1):52-63. doi: 10.1038/s41437-017-0041-2. Epub 2018 Jan 23.
The ancestral tracks in admixed genomes are valuable for population history inference. While a few methods have been developed to infer admixture history based on ancestral tracks, these methods suffer the same flaw: only population admixture history under some specific models can be inferred. In addition, the inference of history might be biased or even unreliable if the specific model deviates from the real situation. To address this problem, we firstly proposed a general discrete admixture model to describe the admixture history with multiple ancestral populations and multiple-wave admixtures. We next deduced the length distribution of ancestral tracks under the general discrete admixture model. We further developed a new method, MultiWaver, to explore multiple-wave admixture histories. Our method could automatically determine an optimal admixture model based on the length distribution of ancestral tracks, and estimate the corresponding parameters under this optimal model. Specifically, we used a likelihood ratio test (LRT) to determine the number of admixture waves, and implemented an expectation-maximization (EM) algorithm to estimate parameters. We used simulation studies to validate the reliability and effectiveness of our method. Finally, good performance was observed when our method was applied to real data sets of African Americans and Mexicans, and new insights were gained into the admixture history of Uyghurs and Hazaras.
混合基因组中的祖先轨迹对于群体历史推断很有价值。虽然已经开发了一些基于祖先轨迹推断混合历史的方法,但这些方法都存在一个相同的缺陷:只能推断某些特定模型下的群体混合历史。此外,如果具体模型偏离实际情况,那么推断的历史可能会有偏差,甚至不可靠。为了解决这个问题,我们首先提出了一种通用的离散混合模型,用于描述具有多个祖先群体和多波混合的混合历史。接下来,我们推导出了通用离散混合模型下祖先轨迹的长度分布。我们进一步开发了一种新的方法 MultiWaver,用于探索多波混合历史。我们的方法可以根据祖先轨迹的长度分布自动确定最佳的混合模型,并在此最佳模型下估计相应的参数。具体来说,我们使用似然比检验 (LRT) 来确定混合波的数量,并实现了期望最大化 (EM) 算法来估计参数。我们使用模拟研究验证了我们方法的可靠性和有效性。最后,当我们的方法应用于非裔美国人和墨西哥人的真实数据集时,观察到了良好的性能,并对维吾尔族和哈扎拉人的混合历史有了新的认识。