Mallo Diego, De Oliveira Martins Leonardo, Posada David
Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain.
Syst Biol. 2016 Mar;65(2):334-44. doi: 10.1093/sysbio/syv082. Epub 2015 Nov 1.
We present a fast and flexible software package--SimPhy--for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer--all three potentially leading to species tree/gene tree discordance--and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases.
我们展示了一个快速且灵活的软件包——SimPhy,用于模拟多个基因家族在不完全谱系分选、基因复制与丢失、水平基因转移(这三者都可能导致物种树/基因树不一致)以及基因转换情况下的进化。SimPhy实现了一种分层系统发育模型,其中物种树、基因座树和基因树的进化由全局和局部参数(例如全基因组范围、物种特异性、基因座特异性)控制,这些参数可以固定,也可以从先验统计分布中采样。SimPhy还纳入了谱系间替换率变化的综合模型(不相关的宽松时钟),并且能够使用INDELible程序在大量替换模型下模拟分区核苷酸、密码子和蛋白质多位点序列比对。我们使用理论预期和其他程序验证了SimPhy的输出,并表明它在复杂模型和/或大树情况下扩展性极佳,比最相似的程序(DLCoal-Sim)快一个数量级。此外,我们展示了SimPhy如何有助于理解不同进化过程之间的相互作用,通过进行一项模拟研究来表征使用标准比对方法时复制时间的系统性高估。可在https://github.com/adamallo/SimPhy获取SimPhy,用户可以在那里找到源代码、预编译的可执行文件以及详细的手册和示例案例。