Garba Maryam K, Nye Tom M W, Boys Richard J
School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
Department of Mathematical Sciences, Bayero University, Kano, Nigeria.
Syst Biol. 2018 Mar 1;67(2):320-327. doi: 10.1093/sysbio/syx080.
Most existing measures of distance between phylogenetic trees are based on the geometry or topology of the trees. Instead, we consider distance measures which are based on the underlying probability distributions on genetic sequence data induced by trees. Monte Carlo schemes are necessary to calculate these distances approximately, and we describe efficient sampling procedures. Key features of the distances are the ability to include substitution model parameters and to handle trees with different taxon sets in a principled way. We demonstrate some of the properties of these new distance measures and compare them to existing distances, in particular by applying multidimensional scaling to data sets previously reported as containing phylogenetic islands. [Metric; probability distribution; multidimensional scaling; information geometry.
大多数现有的系统发育树之间的距离度量是基于树的几何结构或拓扑结构。相反,我们考虑基于树所诱导的遗传序列数据的潜在概率分布的距离度量。蒙特卡罗方法对于近似计算这些距离是必要的,并且我们描述了有效的抽样程序。这些距离的关键特征是能够纳入替代模型参数,并以一种有原则的方式处理具有不同分类单元集的树。我们展示了这些新距离度量的一些性质,并将它们与现有距离进行比较,特别是通过对先前报道包含系统发育岛的数据集应用多维缩放来进行比较。[度量;概率分布;多维缩放;信息几何。