Rannala Bruce, Yang Ziheng
Department of Medical Genetics, University of Alberta, Edmonton, Alberta T6G 2H7, Canada.
Genetics. 2003 Aug;164(4):1645-56. doi: 10.1093/genetics/164.4.1645.
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.
祖先物种以及现代物种的有效种群大小是种群遗传学和人类进化模型中的重要参数。常用的基于计算物种树与推断的基因树之间错配来估计祖先种群大小的方法存在高度偏差,因为它忽略了基因树重建中的不确定性。在本文中,我们开发了一种贝叶斯方法,用于同时估计物种分歧时间以及当前和祖先种群大小。该方法使用来自多个基因座的DNA序列数据,并提取有关基因树拓扑结构和合并时间之间冲突的信息来估计祖先种群大小。假设物种树的拓扑结构已知。实施马尔可夫链蒙特卡罗算法,以对每个基因座以及物种分歧时间的不确定基因树和分支长度(或合并时间)进行积分。该方法可以处理任何物种树,并允许不同基因座有不同数量的序列。我们将该方法应用于已发表的人类和大猩猩的非编码DNA序列。物种形成时间的后验估计与祖先种群大小之间存在很强的相关性。利用人类与黑猩猩分歧日期的信息先验,估计这两个物种共同祖先的种群大小约为20000,95%的可信区间为(8000, 40000)。然而,我们的估计受到模型假设以及数据质量的影响。我们建议,可靠的估计还有待更多数据和更现实的模型。