School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia.
Evolution of Cultural Diversity Initiative, The Australian National University, Canberra, Australian Capital Territory, Australia.
Mol Ecol Resour. 2024 Apr;24(3):e13930. doi: 10.1111/1755-0998.13930. Epub 2024 Jan 21.
Population genetic simulation has emerged as a common tool for investigating increasingly complex evolutionary and demographic models. Software capable of handling high-level model complexity has recently been developed, and the advancement of tree sequence recording now allows simulations to merge the efficiency and genealogical insight of coalescent simulations with the flexibility of forward simulations. However, frameworks utilizing these features have not yet been compared and benchmarked. Here, we evaluate various simulation workflows using the coalescent simulator msprime and the forward simulator SLiM, to assess resource efficiency and determine an optimal simulation framework. Three aspects were evaluated: (1) the burn-in, to establish an equilibrium level of neutral diversity in the population; (2) the forward simulation, in which temporally fluctuating selection is acting; and (3) the final computation of summary statistics. We provide typical memory and computation time requirements for each step. We find that the fastest framework, a combination of coalescent and forward simulation with tree sequence recording, increases simulation speed by over twenty times compared to classical forward simulations without tree sequence recording, although it does require six times more memory. Overall, using efficient simulation workflows can lead to a substantial improvement when modelling complex evolutionary scenarios-although the optimal framework ultimately depends on the available computational resources.
群体遗传模拟已成为研究日益复杂的进化和人口模型的常用工具。最近开发了能够处理高级模型复杂性的软件,并且树序列记录的进步现在允许模拟将合并模拟的效率和系统发育洞察力与正向模拟的灵活性。但是,利用这些功能的框架尚未进行比较和基准测试。在这里,我们使用合并模拟器 msprime 和正向模拟器 SLiM 评估了各种模拟工作流程,以评估资源效率并确定最佳模拟框架。评估了三个方面:(1)预热期,以在种群中建立中性多样性的平衡水平;(2)正向模拟,其中随时间波动的选择正在起作用;以及(3)总结统计数据的最终计算。我们提供了每个步骤的典型内存和计算时间要求。我们发现,最快的框架是使用树序列记录进行合并和正向模拟的组合,与没有树序列记录的经典正向模拟相比,它将模拟速度提高了二十多倍,尽管它确实需要多六倍的内存。总体而言,在对复杂的进化场景进行建模时,使用有效的模拟工作流程可以带来实质性的改进-尽管最佳框架最终取决于可用的计算资源。