Karcher Michael D, Palacios Julia A, Bedford Trevor, Suchard Marc A, Minin Vladimir N
Department of Statistics, University of Washington, Seattle, Washington, United States of America.
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America.
PLoS Comput Biol. 2016 Mar 3;12(3):e1004789. doi: 10.1371/journal.pcbi.1004789. eCollection 2016 Mar.
Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.
系统发育动力学旨在根据从感兴趣的群体中采样的个体的分子序列来估计有效种群大小的波动。完成这项任务的一种方法是,利用采样个体谱系的溯祖模型来制定观察到的序列数据似然性,然后通过蒙特卡罗方法对所有可能的谱系进行积分,或者效率较低地通过基于从序列数据估计的一个谱系进行条件设定。然而,在分析随时间序列采样的序列时,当前方法隐含地假设要么采样时间由数据收集协议确定性地固定,要么它们的分布不依赖于种群大小。通过模拟,我们首先表明,当采样时间确实概率性地依赖于有效种群大小,估计方法可能会出现系统性偏差。为了纠正这一缺陷,我们提出了一个新模型,通过将采样时间建模为依赖于有效种群大小的非齐次泊松过程,明确考虑了优先采样。我们证明,在存在优先采样的情况下,我们的新模型不仅减少了偏差,还提高了估计精度。最后,我们通过临床相关的季节性人类流感实例,比较了当前使用的系统发育动力学方法与我们提出的模型的性能。