Faculty of Computer Science, Dalhousie University, 6050 University Avenue, PO Box 15000, Halifax, Nova Scotia, Canada B3H 4R2.
Faculty of Computer Science, Dalhousie University, 6050 University Avenue, PO Box 15000, Halifax, Nova Scotia, Canada B3H 4R2
Syst Biol. 2014 Jul;63(4):566-81. doi: 10.1093/sysbio/syu023. Epub 2014 Apr 2.
Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and>50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson-Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. [Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson-Foulds; subtree prune-and-regraft; supertrees.].
系统发育树整合方法将一组系统发育树整合为一个单一的结构,通常将其解释为物种的分支历史。一个关键的挑战是合并由于系统发育重建的伪影和侧向基因转移(LGT)等现象而引起的冲突进化历史。许多系统发育树整合方法使用的最优性标准不能反映潜在过程,具有已知的偏差,并且可能受到 LGT 的不当影响。我们提出了第一种使用分支修剪和重接(SPR)距离作为最优性标准构建系统发育树的方法。尽管计算两棵树之间的有根 SPR 距离是 NP 难的,但我们的新基于最大一致森林的方法可以在几分之一秒内协调具有数百个分类群和>50 次转移的树,这使得在迭代搜索过程中可以重复计算。我们的方法可以容纳已将不确定关系折叠为多叉节点的树。使用一系列在合理 LGT 速率下模拟的基准数据集,我们表明 SPR 系统发育树与正确的物种历史更为相似,而不是基于简约或 Robinson-Foulds 距离标准的系统发育树。我们成功地从包含 244 个代表几个主要细菌门的基因组的 40631 个基因树的基因组数据集构建了一个 SPR 系统发育树。我们基于 SPR 的方法还允许直接推断细菌类和属之间基因转移的高速公路。这些高速公路中有一小部分连接不同门的属,并可以突出涉及长距离 LGT 的特定基因。[侧向基因转移;简约矩阵表示;基因组学;原核系统发育;Robinson-Foulds;分支修剪和重接;系统发育树整合。]