Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden.
BMC Evol Biol. 2010 Feb 13;10:42. doi: 10.1186/1471-2148-10-42.
Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees.
We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum RSPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum RSPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the RSPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum RSPR distance and the intermediate trees.
When used in exhaustive search mode, SPRIT identifies the minimum number of RSPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum RSPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.
基于一组分类单元的序列构建的系统发育树可能由于水平基因转移(HGT)而不一致。通过识别 HGT 事件,我们可以协调基因树,并得出一个充分代表物种进化历史的分类单元树。一个 HGT 可以用一个有根的子树修剪和重新连接(RSPR)操作来表示,两个树之间的 RSPR 数量对应于 HGT 事件的最小数量。识别两个树之间的最小 RSPR 数量是 NP 难问题,但可以将问题简化为固定参数可解。已经提出了一些启发式方法和两种确定最小 RSPR 数量的精确方法。这是第一个提供精确解决方案以及连接输入树的中间树的实现。
我们提出了 SPR 识别工具(SPRIT),这是一种解决固定参数可解最小 RSPR 问题的新算法及其 GPL 许可的 Java 实现。该算法可以以两种方式使用,一种是保证最小 RSPR 距离的穷举搜索,另一种是保证找到解决方案但不一定是最小解决方案的启发式方法。我们在两种不同的设置中,即从小到中等大小的树(即五到一百个分类单元)和大的树(即数千个分类单元)中,对 SPRIT 与其他软件进行了基准测试。在随机人工不一致的小到中等大小的树大小设置中,SPRIT 的启发式模式通过始终提供具有低 RSPR 距离高估的解决方案来优于其他软件。在大型树设置中,SPRIT 在合理的时间内找到最小解决方案时与替代方案相比表现良好。SPRIT 同时提供最小 RSPR 距离和中间树。
当在穷举搜索模式下使用时,SPRIT 可以识别协调两个不一致的有根树所需的最小 RSPR 数量。SPRIT 还可以快速近似最小 RSPR 距离,这些距离与纯启发式解决方案相当,并且通常更好。总的来说,SPRIT 是识别 HGT 事件并确定哪些分类单元参与 HGT 的绝佳工具。