School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
PLoS One. 2019 Nov 21;14(11):e0224197. doi: 10.1371/journal.pone.0224197. eCollection 2019.
Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.
系统发育树在生物学中经常被用来研究许多物种或生物体之间的关系。系统发育树的形状包含有关物种形成和灭绝模式的有用信息,因此需要强大的工具来研究系统发育树的形状。树形状统计是一种通过用单个数字对系统发育树进行编码来量化树形状的常用方法。在本文中,我们提出了一种新的分辨率函数来评估不同树形状统计量区分不同树的能力。我们表明,与以前提出的树形状统计分辨率函数相比,新的分辨率函数需要更少的时间和空间。我们还引入了一类新的树形状统计量,它们是两个现有统计量的线性组合,针对分辨率函数是最优的,并表明该类中的统计量随着树的大小的增加而收敛到一个限制的线性组合。我们的实现可以在 https://github.com/WGS-TB/TreeShapeStats 上免费获得。