Palacios Julia A, Minin Vladimir N
Department of Statistics, University of Washington, Seattle, Washington 98195-4322, USA.
Biometrics. 2013 Mar;69(1):8-18. doi: 10.1111/biom.12003. Epub 2013 Feb 14.
Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.
种群大小的变化会影响种群的遗传多样性,结果是在种群中个体基因组上留下这些变化的印记。我们感兴趣的是从基因组数据重建过去种群动态的逆问题。我们从基于溯祖理论的标准框架开始,溯祖理论是一种随机过程,它生成连接从感兴趣种群中随机抽样个体的谱系。这些谱系充当种群人口统计历史和基因组序列之间的纽带。事实证明,只有谱系合并的时间包含有关种群大小动态的信息。将这些合并时间视为一个点过程,估计种群大小轨迹等同于估计这个点过程的条件强度。因此,我们的逆问题类似于估计一个非齐次泊松过程强度函数。我们展示了基于高斯过程的泊松过程非参数推断的最新进展如何能够扩展到在溯祖理论下对种群大小动态的贝叶斯非参数估计。我们将我们的高斯过程(GP)方法与一种用于估计种群轨迹的最先进的高斯马尔可夫随机场(GMRF)方法进行比较。使用模拟数据,我们证明我们的方法具有更好的准确性和精度。接下来,我们分析从丙型肝炎病毒和甲型流感病毒的真实序列重建的两个谱系。在这两种情况下,与GMRF方法相比,我们恢复了病毒种群历史中更可信的方面。我们还发现,我们的GP方法比GMRF方法产生更合理的不确定性估计。