进化枝置信度的度量与系统发育树的准确性不相关。

Measures of clade confidence do not correlate with accuracy of phylogenetic trees.

作者信息

Hall Barry G, Salipante Stephen J

机构信息

Bellingham Research Institute, Bellingham, Washington, United States of America.

出版信息

PLoS Comput Biol. 2007 Mar 16;3(3):e51. doi: 10.1371/journal.pcbi.0030051.

DOI:10.1371/journal.pcbi.0030051

PMID:17367204

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1828704/

Abstract

Metrics of phylogenetic tree reliability, such as parametric bootstrap percentages or Bayesian posterior probabilities, represent internal measures of the topological reproducibility of a phylogenetic tree, while the recently introduced aLRT (approximate likelihood ratio test) assesses the likelihood that a branch exists on a maximum-likelihood tree. Although those values are often equated with phylogenetic tree accuracy, they do not necessarily estimate how well a reconstructed phylogeny represents cladistic relationships that actually exist in nature. The authors have therefore attempted to quantify how well bootstrap percentages, posterior probabilities, and aLRT measures reflect the probability that a deduced phylogenetic clade is present in a known phylogeny. The authors simulated the evolution of bacterial genes of varying lengths under biologically realistic conditions, and reconstructed those known phylogenies using both maximum likelihood and Bayesian methods. Then, they measured how frequently clades in the reconstructed trees exhibiting particular bootstrap percentages, aLRT values, or posterior probabilities were found in the true trees. The authors have observed that none of these values correlate with the probability that a given clade is present in the known phylogeny. The major conclusion is that none of the measures provide any information about the likelihood that an individual clade actually exists. It is also found that the mean of all clade support values on a tree closely reflects the average proportion of all clades that have been assigned correctly, and is thus a good representation of the overall accuracy of a phylogenetic tree.

摘要

系统发育树可靠性的指标，如参数自展百分比或贝叶斯后验概率，代表了系统发育树拓扑结构可重复性的内部度量，而最近引入的近似似然比检验（aLRT）则评估了一个分支存在于最大似然树上的可能性。尽管这些值常常被等同于系统发育树的准确性，但它们并不一定能估计出重建的系统发育关系在多大程度上代表了自然界中实际存在的分支关系。因此，作者们试图量化自展百分比、后验概率和aLRT度量在多大程度上反映了推导出来的系统发育分支在已知系统发育关系中出现的概率。作者们在生物学上现实的条件下模拟了不同长度细菌基因的进化，并使用最大似然法和贝叶斯法重建了那些已知的系统发育关系。然后，他们测量了在重建树中显示特定自展百分比、aLRT值或后验概率的分支在真实树中出现的频率。作者们观察到，这些值中没有一个与给定分支在已知系统发育关系中出现的概率相关。主要结论是，这些度量都没有提供任何关于单个分支实际存在可能性的信息。还发现，一棵树上所有分支支持值的平均值密切反映了所有正确分配的分支的平均比例，因此是系统发育树整体准确性的良好体现。