Suppr超能文献

基于匹配的系统发育树度量标准。

A metric for phylogenetic trees based on matching.

机构信息

Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Swiss Federal Institute of Technology-EPFL, INJ 211, Station 14, Lausanne CH-1015, Switzerland.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1014-22. doi: 10.1109/TCBB.2011.157.

Abstract

Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes--reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.

摘要

比较两个或多个系统发育树是计算生物学中的一项基本任务。这种比较的最简单结果是相似性、相异性或距离的两两度量。已经提出了大量这样的度量方法,但到目前为止,所有这些方法都存在从计算成本到缺乏稳健性等问题;许多方法在某些合理的输入下表现出出乎意料的行为。例如,广泛使用的罗宾逊-福尔德距离分布不佳,因此区分度不大,而且在面对非常小的变化时也缺乏稳健性——在任何大小的树中重新连接单个叶子都会立即最大化距离。在本文中,我们为系统发育树引入了一种新的基于匹配的成对距离度量。我们证明了我们的度量在树上的空间中诱导出一个度量,展示了如何在低多项式时间内计算它,通过统计测试验证了它的稳健性,最后注意到它在相同的输入下不会表现出与其他度量方法相同的异常行为。我们还说明了它在聚类树中的有用性,与使用罗宾逊-福尔德距离对相同的树集合进行聚类相比,它显著提高了层次聚类的质量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验