School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
PLoS One. 2011;6(8):e22483. doi: 10.1371/journal.pone.0022483. Epub 2011 Aug 24.
Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way. We design a new DCM based on the spectral method and also develop the COGNAC (Comparing Orders of Genes using Novel Algorithms and high-performance Computers) software package. COGNAC uses the new DCM to reduce the phylogenetic tree search space and selects an output tree from the reduced search space based on the MP principle. We test the new DCM using gene order data and inversion distance. The new DCM not only reduces the number of candidate tree topologies but also excludes erroneous tree topologies which can be selected by original MP methods. Initial labeling of internal genomes affects the accuracy of MP methods using gene order data, and the new DCM enables more accurate initial labeling as well. COGNAC demonstrates superior accuracy as a consequence. We compare COGNAC with FastME and the combination of the state of the art DCM (Rec-I-DCM3) and GRAPPA. COGNAC clearly outperforms FastME in accuracy. COGNAC--using the new DCM--also reconstructs a much more accurate tree in significantly shorter time than GRAPPA with Rec-I-DCM3.
最大简约法(MP)旨在通过使用物种的基因组数据找到最简约的进化情景,从而重建现存物种的系统发育。MP 方法被认为是准确的,但它们的计算成本也很高,尤其是对于大量的物种。已经提出了几种磁盘覆盖方法(DCM),这些方法将输入的物种分解为多个重叠的子组(或磁盘),以便以分而治之的方式解决问题。我们设计了一种基于谱方法的新 DCM,并且还开发了 COGNAC(使用新算法和高性能计算机比较基因顺序)软件包。COGNAC 使用新的 DCM 来缩小系统发育树搜索空间,并根据 MP 原则从缩小的搜索空间中选择输出树。我们使用基因顺序数据和倒位距离来测试新的 DCM。新的 DCM 不仅减少了候选树拓扑结构的数量,而且还排除了原始 MP 方法可以选择的错误树拓扑结构。基因顺序数据中内部基因组的初始标记会影响 MP 方法的准确性,而新的 DCM 使得初始标记更加准确。因此,COGNAC 的准确性更高。我们将 COGNAC 与 FastME 进行了比较,还将最先进的 DCM(Rec-I-DCM3)与 GRAPPA 进行了比较。COGNAC 在准确性方面明显优于 FastME。与使用 Rec-I-DCM3 的 GRAPPA 相比,使用新 DCM 的 COGNAC 重建出的树更加准确,而且时间也大大缩短。