Suppr超能文献

从任意大种群中进行精确且高效的系统动力学模拟。

Exact and efficient phylodynamic simulation from arbitrarily large populations.

作者信息

Celentano Michael, DeWitt William S, Prillo Sebastian, Song Yun S

机构信息

Department of Statistics, University of California, Berkeley.

Computer Science Division, University of California, Berkeley.

出版信息

ArXiv. 2024 Aug 10:arXiv:2402.17153v2.

Abstract

Many biological studies involve inferring the evolutionary history of a sample of individuals from a large population and interpreting the reconstructed tree. Such an ascertained tree typically represents only a small part of a comprehensive population tree and is distorted by survivorship and sampling biases. Inferring evolutionary parameters from ascertained trees requires modeling both the underlying population dynamics and the ascertainment process. A crucial component of this phylodynamic modeling involves tree simulation, which is used to benchmark probabilistic inference methods. To simulate an ascertained tree, one must first simulate the full population tree and then prune unobserved lineages. Consequently, the computational cost is determined not by the size of the final simulated tree, but by the size of the population tree in which it is embedded. In most biological scenarios, simulations of the entire population are prohibitively expensive due to computational demands placed on lineages without sampled descendants. Here, we address this challenge by proving that, for any partially ascertained process from a general multi-type birth-death-mutation-sampling model, there exists an equivalent process with and , a property which we leverage to develop a highly efficient algorithm for simulating trees. Our algorithm scales linearly with the size of the final simulated tree and is independent of the population size, enabling simulations from extremely large populations beyond the reach of current methods but essential for various biological applications. We anticipate that this unprecedented speedup will significantly advance the development of novel inference methods that require extensive training data.

摘要

许多生物学研究涉及从大量种群中推断个体样本的进化历史,并解读重建的树。这样一个确定的树通常只代表综合种群树的一小部分,并且会因生存偏差和抽样偏差而扭曲。从确定的树推断进化参数需要对潜在的种群动态和确定过程进行建模。这种系统发育动力学建模的一个关键组成部分涉及树模拟,它被用于对概率推断方法进行基准测试。为了模拟一个确定的树,必须首先模拟完整的种群树,然后修剪未观察到的谱系。因此,计算成本不是由最终模拟树的大小决定的,而是由它所嵌入的种群树的大小决定的。在大多数生物学场景中,由于对没有抽样后代的谱系的计算需求,对整个种群进行模拟成本过高。在这里,我们通过证明对于一般多类型出生-死亡-突变-抽样模型的任何部分确定过程,都存在一个等效过程,其具有 和 ,我们利用这一特性开发了一种高效的树模拟算法。我们的算法与最终模拟树的大小成线性比例,并且与种群大小无关,能够对当前方法无法处理的极大型种群进行模拟,但这对于各种生物学应用至关重要。我们预计这种前所未有的加速将显著推动需要大量训练数据的新型推断方法的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c7/11421513/c523d3f4311f/nihpp-2402.17153v2-f0004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验