Department of Earth Sciences, Durham University, Durham, UK.
Syst Biol. 2022 Aug 10;71(5):1255-1270. doi: 10.1093/sysbio/syab100.
Phylogenetic analyses often produce large numbers of trees. Mapping trees' distribution in "tree space" can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods-but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological data sets, using stratigraphic congruence-a complementary aspect of tree similarity-to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson-Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall-Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the "TreeDist" R package. [Multidimensional scaling; phylogenetic software; tree distance metrics; treespace projections.].
系统发育分析通常会产生大量的树。在“树空间”中映射树的分布可以阐明搜索策略的行为和性能,揭示出最优树的不同聚类,并揭示不同数据源或系统发育方法之间的差异——但在较少维度中表示时,由度量距离定义的高维空间必然会发生扭曲。在这里,我使用地层一致性(树相似性的一个补充方面)来评估低维映射的效用,探索了 128 个形态数据集的系统发育搜索结果中的这种转换的后果。我发现,系统发育相似性在基于信息论树距离或四分体距离的树空间中得到了最准确的描述。罗宾逊-福尔德斯树空间表现出明显的扭曲,并且经常无法根据系统发育相似性对树进行分组,而树形状对肯德尔-科林距离的强烈影响使得其树空间不适合许多用途。映射到二维甚至三维的距离通常与真实距离几乎没有对应关系,这可能导致聚类结构的严重表示错误。如果没有明确的测试,就不能确定树空间映射是否忠实地表示了树的真实分布,也不能确定明显的结构是否有效。我对树空间验证和可视化的建议在“TreeDist”R 包中的新图形用户界面中得到了实现。