School of Biological Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
Department of Evolutionary Biology, EBC, Uppsala University, Norbyägen 18D, 75236 Uppsala, Sweden.
Syst Biol. 2020 Sep 1;69(5):863-883. doi: 10.1093/sysbio/syaa003.
In recent years, there has been controversy whether multidimensional data such as geometric morphometric data or information on gene expression can be used for estimating phylogenies. This study uses simulations of evolution in multidimensional phenotype spaces to address this question and to identify specific factors that are important for answering it. Most of the simulations use phylogenies with four taxa, so that there are just three possible unrooted trees and the effect of different combinations of branch lengths can be studied systematically. In a comparison of methods, squared-change parsimony performed similarly well as maximum likelihood, and both methods outperformed Wagner and Euclidean parsimony, neighbor-joining and UPGMA. Under an evolutionary model of isotropic Brownian motion, phylogeny can be estimated reliably if dimensionality is high, even with relatively unfavorable combinations of branch lengths. By contrast, if there is phenotypic integration such that most variation is concentrated in one or a few dimensions, the reliability of phylogenetic estimates is severely reduced. Evolutionary models with stabilizing selection also produce highly unreliable estimates, which are little better than picking a phylogenetic tree at random. To examine how these results apply to phylogenies with more than four taxa, we conducted further simulations with up to eight taxa, which indicated that the effects of dimensionality and phenotypic integration extend to more than four taxa, and that convergence among internal nodes may produce additional complications specifically for greater numbers of taxa. Overall, the simulations suggest that multidimensional data, under evolutionary models that are plausible for biological data, do not produce reliable estimates of phylogeny. [Brownian motion; gene expression data; geometric morphometrics; morphological integration; squared-change parsimony; phylogeny; shape; stabilizing selection.].
近年来,人们对于是否可以使用多维数据(如几何形态测量数据或基因表达信息)来估计系统发育一直存在争议。本研究通过在多维表型空间中模拟进化来解决这个问题,并确定对回答这个问题很重要的具体因素。大多数模拟使用具有四个分类群的系统发育,因此只有三种可能的无根树,并且可以系统地研究分支长度的不同组合的影响。在方法比较中,平方变化简约法与最大似然法表现相当,并且这两种方法都优于 Wagner 和欧式简约法、邻接法和 UPGMA。在各向同性布朗运动的进化模型下,如果维度较高,即使分支长度的组合相对不利,也可以可靠地估计系统发育。相比之下,如果存在表型整合,即大多数变异集中在一个或几个维度上,那么系统发育估计的可靠性就会严重降低。具有稳定选择的进化模型也会产生高度不可靠的估计值,这些估计值比随机选择系统发育树好不了多少。为了研究这些结果如何适用于具有四个以上分类群的系统发育,我们进一步进行了多达八个分类群的模拟,结果表明维度和表型整合的影响扩展到四个以上分类群,并且内部节点之间的收敛可能会为更多的分类群产生额外的复杂性。总体而言,模拟结果表明,在对于生物数据来说合理的进化模型下,多维数据不会产生可靠的系统发育估计。 [布朗运动;基因表达数据;几何形态测量学;形态整合;平方变化简约法;系统发育;形状;稳定选择。]