Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, USA.
Mol Biol Evol. 2013 Mar;30(3):713-24. doi: 10.1093/molbev/mss265. Epub 2012 Nov 22.
Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.
有效种群大小是种群遗传学的基础,它描述了遗传多样性。为了从分子序列数据中推断过去的种群动态,已经开发出基于合并的模型,以便对有效种群大小进行贝叶斯非参数估计随时间的变化。其中最成功的是用于单个基因座的高斯马尔可夫随机场(GMRF)模型。在这里,我们提出了 GMRF 模型的推广,允许对多基因座序列数据进行分析。使用模拟数据,我们证明了我们的方法在恢复真实种群轨迹和最近共同祖先(TMRCA)时间方面的性能得到了提高。我们分析了从喀麦隆采集的 HIV-1 CRF02_AG 基因序列的多位点对齐。我们的结果与 HIV 流行数据一致,并揭示了一些在贝叶斯参数估计中未被发现的种群历史方面。最后,我们为一个经典的古代 DNA 数据集恢复了一个更古老且更协调的 TMRCA。