Suppr超能文献

基于信息论的广义 Robinson-Foulds 度量在比较系统发生树中的应用。

Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees.

机构信息

Department of Earth Sciences, Lower Mountjoy, Durham University, Durham DH1 3LE, UK.

出版信息

Bioinformatics. 2020 Dec 22;36(20):5007-5013. doi: 10.1093/bioinformatics/btaa614.

Abstract

MOTIVATION

The Robinson-Foulds (RF) metric is widely used by biologists, linguists and chemists to quantify similarity between pairs of phylogenetic trees. The measure tallies the number of bipartition splits that occur in both trees-but this conservative approach ignores potential similarities between almost-identical splits, with undesirable consequences. 'Generalized' RF metrics address this shortcoming by pairing splits in one tree with similar splits in the other. Each pair is assigned a similarity score, the sum of which enumerates the similarity between two trees. The challenge lies in quantifying split similarity: existing definitions lack a principled statistical underpinning, resulting in misleading tree distances that are difficult to interpret. Here, I propose probabilistic measures of split similarity, which allow tree similarity to be measured in natural units (bits).

RESULTS

My new information-theoretic metrics outperform alternative measures of tree similarity when evaluated against a broad suite of criteria, even though they do not account for the non-independence of splits within a single tree. Mutual clustering information exhibits none of the undesirable properties that characterize other tree comparison metrics, and should be preferred to the RF metric.

AVAILABILITY AND IMPLEMENTATION

The methods discussed in this article are implemented in the R package 'TreeDist', archived at https://dx.doi.org/10.5281/zenodo.3528123.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

罗宾逊-福尔德(RF)度量被生物学家、语言学家和化学家广泛用于量化对二叉树对之间的相似性。该度量方法计算在两棵树中发生的二分分裂的数量 - 但这种保守方法忽略了几乎相同的分裂之间潜在的相似性,这会带来不良后果。“广义”RF 度量通过将一棵树中的分裂与另一棵树中的相似分裂配对来解决这个问题。每对分配一个相似得分,其总和枚举了两棵树之间的相似性。挑战在于量化分裂的相似性:现有的定义缺乏有原则的统计基础,导致难以解释的误导性树距离。在这里,我提出了分裂相似性的概率度量,允许以自然单位(位)测量树的相似性。

结果

我的新信息论度量方法在广泛的标准评估中优于其他树相似性度量方法,即使它们不考虑一棵树内分裂的非独立性。相互聚类信息没有表现出其他树比较度量所具有的不良特性,并且应该优先于 RF 度量。

可用性和实现

本文讨论的方法在 R 包“TreeDist”中实现,存档于 https://dx.doi.org/10.5281/zenodo.3528123。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验