Cappello Lorenzo, Lo Wai Tung 'Jack', Zhang Joy Z, Xu Peiyu, Barrow Daniel, Chopra Ishani, Clark Andrew G, Wells Martin T, Kim Jaehee
Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona 08005, Spain.
Data Science Center, Barcelona School of Economics, Barcelona 08005, Spain.
Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2501394122. doi: 10.1073/pnas.2501394122. Epub 2025 May 2.
Many organisms employ reversible dormancy, or seedbank, in response to environmental fluctuations. This life-history strategy alters fundamental ecoevolutionary forces, leading to distinct patterns of genetic diversity. Two models of dormancy have been proposed based on the average duration of dormancy relative to coalescent timescales: weak seedbank, induced by scheduled seasonality (e.g., plants, invertebrates), and strong seedbank, where individuals stochastically switch between active and dormant states (e.g., bacteria, fungi). The weak seedbank coalescent is statistically equivalent to the Kingman coalescent with a scaled mutation rate, allowing the use of existing inference methods. In contrast, the strong seedbank coalescent differs fundamentally, as only active lineages can coalesce, while dormant lineages cannot. Additionally, dormant individuals typically mutate at a slower rate than active ones. Consequently, despite the significant role of dormancy in the ecoevolutionary dynamics of many organisms, no methods currently exist for inferring population dynamics involving dormancy and associated parameters. We present a Bayesian framework for jointly inferring a latent genealogy, seedbank parameters, and evolutionary parameters from molecular sequence data under the strong seedbank coalescent. We derive the exact probability density of genealogies sampled under the strong seedbank coalescent, characterize the corresponding likelihood function, and present efficient computational algorithms for its evaluation based on our theoretical framework. We develop a tailored Markov chain Monte Carlo sampler and implement our inference framework as a package SeedbankTree within BEAST2. Our work provides both a theoretical foundation and practical inference framework for studying the population genetic and genealogical impacts of dormancy.
许多生物体会采用可逆休眠或种子库策略来应对环境波动。这种生活史策略改变了基本的生态进化力量,导致了独特的遗传多样性模式。基于休眠平均持续时间相对于溯祖时间尺度,人们提出了两种休眠模型:弱种子库,由定时季节性因素诱导(如植物、无脊椎动物);强种子库,个体在活跃状态和休眠状态之间随机切换(如细菌、真菌)。弱种子库溯祖在统计上等同于具有缩放突变率的金曼溯祖,这使得可以使用现有的推断方法。相比之下,强种子库溯祖则有根本不同,因为只有活跃谱系能够合并,而休眠谱系不能。此外,休眠个体的突变率通常比活跃个体慢。因此,尽管休眠在许多生物的生态进化动态中起着重要作用,但目前还没有方法可以推断涉及休眠及相关参数的种群动态。我们提出了一个贝叶斯框架,用于在强种子库溯祖模型下从分子序列数据中联合推断潜在谱系、种子库参数和进化参数。我们推导了在强种子库溯祖模型下采样谱系的精确概率密度,刻画了相应的似然函数,并基于我们的理论框架提出了有效的计算算法来评估它。我们开发了一个定制的马尔可夫链蒙特卡罗采样器,并将我们的推断框架作为BEAST2中的一个包SeedbankTree来实现。我们的工作为研究休眠对种群遗传和谱系的影响提供了理论基础和实际推断框架。