Fischer Mareike, Galla Michelle, Herbst Lina, Steel Mike
Department for Mathematics and Computer Science, Ernst-Moritz-Arndt University, Greifswald, Germany.
Allan Wilson Centre, University of Canterbury, Christchurch, New Zealand.
Mol Phylogenet Evol. 2014 Nov;80:165-8. doi: 10.1016/j.ympev.2014.07.010. Epub 2014 Jul 29.
Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree 'shapes'. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of k such characters, as we show. For k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters k grows. However, again there is a twist: MP trees on six taxa for k=2 random binary characters are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts. Moreover, this shape bias appears, from simulations, to be more pronounced for larger values of k.
应用一种从随机数据重建系统发育树的方法,提供了一种检测该方法是否对某些树“形状”存在固有偏差的途径。对于应用于随机二态数据序列的最大简约法,每个可能的二叉系统发育树的简约得分分布完全相同。尽管存在这种令人愉悦且略显意外的对称性,但正如我们所展示的,对于k个这样的字符序列,某些二叉系统发育树比其他树更有可能成为最简约(MP)树。对于k = 2以及六个分类单元上的无根二叉树,任何具有毛毛虫形状的树成为MP树的机会都比任何具有对称形状的树更高。另一方面,如果我们取任意两个二叉树,无论分类单元数量多少,我们证明随着字符数量k的增加,这两棵树之间的这种偏差会消失。然而,又有一个转折:对于k = 2的随机二态字符,六个分类单元上的MP树比二叉系统发育树上的均匀分布预测更有可能具有某些形状。此外,从模拟结果来看,这种形状偏差对于更大的k值似乎更为明显。