Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York.
Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon.
Mol Ecol Resour. 2019 Mar;19(2):552-566. doi: 10.1111/1755-0998.12968. Epub 2019 Feb 21.
There is an increasing demand for evolutionary models to incorporate relatively realistic dynamics, ranging from selection at many genomic sites to complex demography, population structure, and ecological interactions. Such models can generally be implemented as individual-based forward simulations, but the large computational overhead of these models often makes simulation of whole chromosome sequences in large populations infeasible. This situation presents an important obstacle to the field that requires conceptual advances to overcome. The recently developed tree-sequence recording method (Kelleher, Thornton, Ashander, & Ralph, 2018), which stores the genealogical history of all genomes in the simulated population, could provide such an advance. This method has several benefits: (1) it allows neutral mutations to be omitted entirely from forward-time simulations and added later, thereby dramatically improving computational efficiency; (2) it allows neutral burn-in to be constructed extremely efficiently after the fact, using "recapitation"; (3) it allows direct examination and analysis of the genealogical trees along the genome; and (4) it provides a compact representation of a population's genealogy that can be analysed in Python using the msprime package. We have implemented the tree-sequence recording method in SLiM 3 (a free, open-source evolutionary simulation software package) and extended it to allow the recording of non-neutral mutations, greatly broadening the utility of this method. To demonstrate the versatility and performance of this approach, we showcase several practical applications that would have been beyond the reach of previously existing methods, opening up new horizons for the modelling and exploration of evolutionary processes.
越来越多的进化模型需要纳入相对现实的动态因素,范围从多个基因组位点的选择到复杂的人口结构、种群结构和生态相互作用。这些模型通常可以作为基于个体的正向模拟来实现,但这些模型的巨大计算开销通常使得在大型种群中模拟整个染色体序列变得不可行。这种情况给该领域带来了一个重要的障碍,需要概念上的进步来克服。最近开发的树序列记录方法(Kelleher、Thornton、Ashander 和 Ralph,2018),它存储了模拟种群中所有基因组的系谱历史,可以提供这样的进步。该方法具有以下几个优点:(1)它允许在正向时间模拟中完全省略中性突变,并在稍后添加,从而极大地提高了计算效率;(2)它允许使用“回溯”非常有效地构建中性的“燃尽”;(3)它允许直接检查和分析基因组上的系谱树;(4)它提供了一种紧凑的种群系谱表示,可以使用 msprime 包在 Python 中进行分析。我们已经在 SLiM 3 中实现了树序列记录方法(一种免费的、开源的进化模拟软件包),并扩展了它以允许记录非中性突变,极大地拓宽了这种方法的用途。为了展示这种方法的多功能性和性能,我们展示了几个实用的应用程序,这些应用程序超出了之前存在的方法的范围,为进化过程的建模和探索开辟了新的视野。