Suppr超能文献

一种可扩展的方法,用于识别大型系统发育树集中的频繁子树。

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.

机构信息

Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA.

出版信息

BMC Bioinformatics. 2012 Oct 3;13:256. doi: 10.1186/1471-2105-13-256.

Abstract

BACKGROUND

We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees.

RESULTS

We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods.

CONCLUSIONS

Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

摘要

背景

我们考虑在一组系统发育树中寻找最大频繁一致子树(MFAST)的问题。对于这个问题,现有的方法通常无法扩展到包含 100 个以上分类单元的数据集。我们的目标是解决包含超过 1000 个分类单元和数百棵树的数据集的这个问题。

结果

我们开发了一种启发式方法,旨在从多棵大型系统发育树的集合中找到 MFAST。我们的方法分多个阶段进行。在第一阶段,它从输入树集合中识别出作为较大子树种子的小候选子树。在第二阶段,它将这些小种子组合起来构建更大的候选 MFAST。在最后阶段,它执行一个后处理步骤,以确保找到一个不包含在更大频繁一致子树中的频繁一致子树。我们证明了这种启发式方法可以轻松处理具有 1000 个分类单元的数据集,大大扩展了当前方法对 MFAST 的估计。

结论

虽然这种启发式方法不能保证找到所有的 MFAST 或最大的 MFAST,但它在我们所有可以验证结果正确性的合成数据集中都找到了 MFAST。它在大型经验数据集上的表现也很好。它的性能对输入树的数量和大小具有鲁棒性。总的来说,这种方法为在大型系统发育假设中识别强支持的子树提供了一种简单快速的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ca5/3543182/5e8a8a1b76d7/1471-2105-13-256-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验