Department of Zoology, University of Oxford, Oxford, United Kingdom.
Department of Infectious Disease Epidemiology, MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom.
Mol Biol Evol. 2020 Aug 1;37(8):2414-2429. doi: 10.1093/molbev/msaa016.
Estimating past population dynamics from molecular sequences that have been sampled longitudinally through time is an important problem in infectious disease epidemiology, molecular ecology, and macroevolution. Popular solutions, such as the skyline and skygrid methods, infer past effective population sizes from the coalescent event times of phylogenies reconstructed from sampled sequences but assume that sequence sampling times are uninformative about population size changes. Recent work has started to question this assumption by exploring how sampling time information can aid coalescent inference. Here, we develop, investigate, and implement a new skyline method, termed the epoch sampling skyline plot (ESP), to jointly estimate the dynamics of population size and sampling rate through time. The ESP is inspired by real-world data collection practices and comprises a flexible model in which the sequence sampling rate is proportional to the population size within an epoch but can change discontinuously between epochs. We show that the ESP is accurate under several realistic sampling protocols and we prove analytically that it can at least double the best precision achievable by standard approaches. We generalize the ESP to incorporate phylogenetic uncertainty in a new Bayesian package (BESP) in BEAST2. We re-examine two well-studied empirical data sets from virus epidemiology and molecular evolution and find that the BESP improves upon previous coalescent estimators and generates new, biologically useful insights into the sampling protocols underpinning these data sets. Sequence sampling times provide a rich source of information for coalescent inference that will become increasingly important as sequence collection intensifies and becomes more formalized.
从通过时间纵向采样的分子序列估计过去的种群动态是传染病流行病学、分子生态学和宏观进化中的一个重要问题。流行的解决方案,如天际线和天际网格方法,从从采样序列重建的系统发育的合并事件时间推断过去的有效种群大小,但假设序列采样时间对种群大小变化没有信息。最近的工作开始通过探索采样时间信息如何有助于合并推断来质疑这一假设。在这里,我们开发、研究和实现了一种新的天际线方法,称为纪元采样天际线图 (ESP),以联合估计随时间变化的种群大小和采样率的动态。ESP 受到现实世界数据收集实践的启发,包括一个灵活的模型,其中序列采样率与一个纪元内的种群大小成正比,但可以在纪元之间不连续地变化。我们表明,ESP 在几种现实的采样方案下是准确的,我们从理论上证明它至少可以将标准方法可实现的最佳精度提高一倍。我们将 ESP 推广到 BEAST2 中的新贝叶斯包 (BESP) 中,以包含系统发育不确定性。我们重新检查了来自病毒流行病学和分子进化的两个研究充分的实际数据集,发现 BESP 优于以前的合并估计器,并对这些数据集所基于的采样方案产生了新的、对生物学有用的见解。随着序列收集的加强和更加正式化,序列采样时间为合并推断提供了丰富的信息来源,这将变得越来越重要。