Rusinko Joseph P, Hipp Brian
Department of Mathematics, Winthrop University, 142 Bancroft Hall, Rock Hill, SC 29733, USA.
Algorithms Mol Biol. 2012 Dec 6;7(1):35. doi: 10.1186/1748-7188-7-35.
First proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length. Recent developments by algebraic geometers have led to the construction of lists of invariants which have been demonstrated to be more accurate on small sequences, but were limited in that they could only be used for trees with small numbers of taxa. We have developed and tested an invariant based quartet puzzling algorithm which is accurate and efficient for biologically reasonable data sets.
We found that our algorithm outperforms Maximum Likelihood based quartet puzzling on data sets simulated with low to medium evolutionary rates. For faster rates of evolution, invariant based quartet puzzling is reasonable but less effective than maximum likelihood based puzzling.
This is a proof of concept algorithm which is not intended to replace existing reconstruction algorithms. Rather, the conclusion is that when seeking solutions to a new wave of phylogenetic problems (super tree algorithms, gene vs. species tree, mixture models), invariant based methods should be considered. This article demonstrates that invariants are a practical, reasonable and flexible source for reconstruction techniques.
基于不变量的系统发育重建算法最早由卡文德、费尔森斯坦和莱克提出,但被生物学家广泛摒弃,因为人们认为不变量在基于合理长度的DNA序列构建树时准确性有限。代数几何学家的最新进展导致构建了不变量列表,这些不变量在小序列上已被证明更准确,但局限于只能用于分类单元数量较少的树。我们开发并测试了一种基于不变量的四重奏迷惑算法,该算法对于生物学上合理的数据集既准确又高效。
我们发现,在以低到中等进化速率模拟的数据集上,我们的算法优于基于最大似然的四重奏迷惑算法。对于更快的进化速率,基于不变量的四重奏迷惑算法是合理的,但不如基于最大似然的迷惑算法有效。
这是一种概念验证算法,并非旨在取代现有的重建算法。相反,结论是在寻求解决新一代系统发育问题(超级树算法、基因树与物种树、混合模型)时,应考虑基于不变量的方法。本文表明不变量是重建技术的一个实用、合理且灵活的来源。