Deepak Akshay, Fernández-Baca David
Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa, USA.
Department of Computer Science, Iowa State University, Ames, Iowa, USA.
Algorithms Mol Biol. 2014 Jun 18;9:16. doi: 10.1186/1748-7188-9-16. eCollection 2014.
A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events.
We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/.
Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees.
系统发育分析中的一个常见问题是在一组系统发育树中识别频繁出现的模式。大致目标是找到一个物种(分类单元)子集,所有或一些重要的树子集在该子集上达成一致。一种常用的方法是通过最大一致子树(MAST)。MAST还用于其他方面,例如作为比较系统发育树的度量、计算一致性指数以及识别水平基因转移事件。
我们给出了两种在一组系统发育树中识别常见模式的方法的算法和实验结果,一种基于一致子树,称为最大一致子树,另一种基于频繁子树,称为最大频繁子树。这些方法可以返回比MAST更大分类单元集上的子树,并且可以揭示MAST或多数规则树(一种流行的共识方法)中不存在的新的常见系统发育关系。我们当前的实现可在https://code.google.com/p/mfst-miner/网站上获取。
我们的计算结果证实,与最大一致子树相比,最大一致子树和所有最大频繁子树能够揭示系统发育树集合中常见模式更完整的系统发育图景;它们通常也比多数规则树的解析度更高。此外,我们的实验表明,枚举最大频繁子树比枚举普通(不一定是最大的)频繁子树要实用得多。