Suppr超能文献

枚举系统发育树集合中的所有最大频繁子树。

Enumerating all maximal frequent subtrees in collections of phylogenetic trees.

作者信息

Deepak Akshay, Fernández-Baca David

机构信息

Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa, USA.

Department of Computer Science, Iowa State University, Ames, Iowa, USA.

出版信息

Algorithms Mol Biol. 2014 Jun 18;9:16. doi: 10.1186/1748-7188-9-16. eCollection 2014.

Abstract

BACKGROUND

A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events.

RESULTS

We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/.

CONCLUSIONS

Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees.

摘要

背景

系统发育分析中的一个常见问题是在一组系统发育树中识别频繁出现的模式。大致目标是找到一个物种(分类单元)子集,所有或一些重要的树子集在该子集上达成一致。一种常用的方法是通过最大一致子树(MAST)。MAST还用于其他方面,例如作为比较系统发育树的度量、计算一致性指数以及识别水平基因转移事件。

结果

我们给出了两种在一组系统发育树中识别常见模式的方法的算法和实验结果,一种基于一致子树,称为最大一致子树,另一种基于频繁子树,称为最大频繁子树。这些方法可以返回比MAST更大分类单元集上的子树,并且可以揭示MAST或多数规则树(一种流行的共识方法)中不存在的新的常见系统发育关系。我们当前的实现可在https://code.google.com/p/mfst-miner/网站上获取。

结论

我们的计算结果证实,与最大一致子树相比,最大一致子树和所有最大频繁子树能够揭示系统发育树集合中常见模式更完整的系统发育图景;它们通常也比多数规则树的解析度更高。此外,我们的实验表明,枚举最大频繁子树比枚举普通(不一定是最大的)频繁子树要实用得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ccc/4085724/1defd8ba9819/1748-7188-9-16-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验