Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain.
Department of Computational Biology, Cornell University, Ithaca, New York, United States of America.
PLoS Comput Biol. 2023 Mar 20;19(3):e1010897. doi: 10.1371/journal.pcbi.1010897. eCollection 2023 Mar.
The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop, a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2.
合并是一个强大的统计框架,允许我们利用从采样分子序列数据重建的祖先关系来推断过去的种群动态。在许多医学应用中,如传染病研究、细胞发育和肿瘤发生,几个不同的种群共享进化历史,因此变得相互依赖。推断这种依赖性是一个非常重要但具有挑战性的问题。随着测序技术的进步,我们有很好的机会利用丰富的高分辨率生物数据来解决这个问题。在这里,我们提出了 adaPop,这是一个概率模型,可以估计相关种群的过去种群动态,并量化它们的依赖程度。我们方法的一个重要特点是能够在对其功能形状进行最小假设的情况下,通过马尔可夫随机场先验跟踪种群之间随时间变化的关联。我们提供了非参数估计器,这是我们基本模型的扩展,它集成了多个数据源,以及快速可扩展的推断算法。我们使用各种相关种群历史的模拟数据来测试我们的方法,并展示我们的模型在揭示不同 SARS-CoV-2 变体的进化历史方面的效用。