Department of Computer Science, University of Auckland, Auckland New Zealand.
BMC Evol Biol. 2013 Oct 4;13:221. doi: 10.1186/1471-2148-13-221.
Bayesian phylogenetic analysis generates a set of trees which are often condensed into a single tree representing the whole set. Many methods exist for selecting a representative topology for a set of unrooted trees, few exist for assigning branch lengths to a fixed topology, and even fewer for simultaneously setting the topology and branch lengths. However, there is very little research into locating a good representative for a set of rooted time trees like the ones obtained from a BEAST analysis.
We empirically compare new and known methods for generating a summary tree. Some new methods are motivated by mathematical constructions such as tree metrics, while the rest employ tree concepts which work well in practice. These use more of the posterior than existing methods, which discard information not directly mapped to the chosen topology. Using results from a large number of simulations we assess the quality of a summary tree, measuring (a) how well it explains the sequence data under the model and (b) how close it is to the "truth", i.e to the tree used to generate the sequences.
Our simulations indicate that no single method is "best". Methods producing good divergence time estimates have poor branch lengths and lower model fit, and vice versa. Using the results presented here, a user can choose the appropriate method based on the purpose of the summary tree.
贝叶斯系统发生分析会生成一组树,这些树通常会被压缩为一棵代表整个集合的树。有许多方法可用于为一组无根树选择代表拓扑结构,而很少有方法可用于为固定拓扑结构分配分支长度,甚至更少的方法可用于同时设置拓扑结构和分支长度。但是,对于像从 BEAST 分析中获得的有根时间树这样的集合,几乎没有研究如何找到一个好的代表。
我们通过实证比较了生成汇总树的新方法和已知方法。一些新方法是基于树度量等数学结构而提出的,而其他方法则采用在实践中效果良好的树概念。这些方法比现有的方法使用更多的后验信息,后者会丢弃与所选拓扑结构没有直接映射的信息。使用大量模拟的结果,我们评估了汇总树的质量,衡量了(a)它在模型下解释序列数据的程度,以及(b)它与“真实”树的接近程度,即用于生成序列的树。
我们的模拟表明,没有一种方法是“最佳”的。产生良好分歧时间估计的方法具有较差的分支长度和较低的模型拟合度,反之亦然。使用此处呈现的结果,用户可以根据汇总树的目的选择适当的方法。