Lane Center for Computational Biology, Carnegie Mellon University, Mellon Institute Building, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA.
Syst Biol. 2011 Jul;60(4):528-40. doi: 10.1093/sysbio/syr021. Epub 2011 Apr 6.
Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet, most reconstruction methods like maximum likelihood (ML) do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate that is closest on average to the samples. This "median" tree is known as the Bayes estimator (BE). The BE literally maximizes posterior expected accuracy, measured in terms of closeness (distance) to the true tree. We discuss a unified framework of BE trees, focusing especially on tree distances that are expressible as squared euclidean distances. Notable examples include Robinson-Foulds (RF) distance, quartet distance, and squared path difference. Using both simulated and real data, we show that BEs can be estimated in practice by hill-climbing. In our simulation, we find that BEs tend to be closer to the true tree, compared with ML and neighbor joining. In particular, the BE under squared path difference tends to perform well in terms of both path difference and RF distances.
树重建方法通常通过其准确性进行评估,准确性的衡量标准是它们与真实树的接近程度。然而,像最大似然法 (ML) 这样的大多数重建方法并没有明确地最大化这个准确性。为了解决这个问题,我们提出了一个贝叶斯解决方案。给定树样本,我们建议找到平均而言最接近样本的树估计值。这个“中位数”树被称为贝叶斯估计器 (BE)。BE 实际上最大化了后验预期准确性,以与真实树的接近程度(距离)来衡量。我们讨论了 BE 树的统一框架,特别关注可表示为平方欧几里得距离的树距离。值得注意的例子包括罗宾逊-福尔德 (RF) 距离、四分体距离和平方路径差。使用模拟和真实数据,我们表明可以通过爬山法在实践中估计 BE。在我们的模拟中,我们发现与 ML 和邻居连接相比,BE 往往更接近真实树。特别是,在平方路径差下的 BE 在路径差和 RF 距离方面表现良好。