Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada.
Syst Biol. 2022 Oct 12;71(6):1378-1390. doi: 10.1093/sysbio/syac008.
Phylogenetic trees are a central tool in many areas of life science and medicine. They demonstrate evolutionary patterns among species, genes, and patterns of ancestry among sets of individuals. The tree shapes and branch lengths of phylogenetic trees encode evolutionary and epidemiological information. To extract information from tree shapes and branch lengths, representation and comparison methods for phylogenetic trees are needed. Representing and comparing tree shapes and branch lengths of phylogenetic trees are challenging, for a tree shape is unlabeled and can be displayed in numerous different forms, and branch lengths of a tree shape are specific to edges whose positions vary with respect to the displayed forms of the tree shape. In this article, we introduce representation and comparison methods for rooted unlabeled phylogenetic trees based on a tree lattice that serves as a coordinate system for rooted binary trees with branch lengths and a graph polynomial that fully characterizes tree shapes. We show that the introduced tree representations and metrics provide distance-based likelihood-free methods for tree clustering, parameter estimation, and model selection and apply the methods to analyze phylogenies reconstructed from virus sequences. [Graph polynomial; likelihood-free inference; phylogenetics; tree lattice; tree metrics.].
系统发育树是生命科学和医学许多领域的重要工具。它们展示了物种、基因和个体群体之间祖先模式的进化模式。系统发育树的树形状和分支长度编码了进化和流行病学信息。为了从树形状和分支长度中提取信息,需要系统发育树的表示和比较方法。表示和比较系统发育树的树形状和分支长度具有挑战性,因为树形状是无标签的,可以以多种不同的形式显示,并且树形状的分支长度是特定于边缘的,边缘的位置随树形状的显示形式而变化。在本文中,我们基于树格引入了基于根的无标签系统发育树的表示和比较方法,树格作为具有分支长度的基于根的二叉树的坐标系和图多项式,它完全描述了树形状。我们表明,引入的树表示和度量为基于距离的无似然方法提供了聚类、参数估计和模型选择的方法,并将这些方法应用于分析从病毒序列重建的系统发育。[图多项式;无似然推理;系统发育学;树格;树度量]。