Suppr超能文献

系统发生树的信息几何。

Information geometry for phylogenetic trees.

机构信息

School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK.

Department of Mathematical Sciences, Bayero University, Kano, Nigeria.

出版信息

J Math Biol. 2021 Feb 15;82(3):19. doi: 10.1007/s00285-021-01553-x.

Abstract

We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera-Holmes-Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback-Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.

摘要

我们提出了一个新的系统发育树空间,我们称之为 wald 空间。其动机是开发一个适合系统发育学统计分析的空间,但具有比现有空间更基于生物学原理的假设的几何形状:在 wald 空间中,如果树在遗传序列数据上诱导相似的分布,那么它们就会很接近。作为一个点集,wald 空间包含先前开发的 Billera-Holmes-Vogtmann(BHV)树空间;它还包含不连通的森林,如边缘乘积(EP)空间,但没有 EP 空间的某些奇点。我们研究了 wald 空间上的两种相关几何形状。第一种是由每个树上的二状态对称 Markov 替换过程诱导的特征分布的 Fisher 信息度量的几何形状。在无穷小的情况下,度量与 Kullback-Leibler 散度成比例,或者等价地,如我们所示,与任何 f-散度成比例。第二种几何形状是通过在每个树上使用相关的连续值高斯过程类似地获得的,并且可以看作是协方差矩阵的仿射不变度量的迹度量。我们推导了一个梯度下降算法,将协方差矩阵的环境空间投影到 wald 空间。对于这两种几何形状,我们都推导了在多项式时间内计算测地线的计算方法,并数值表明这两种信息几何形状(离散和连续)非常相似。特别是,测地线是外在地近似的。与 BHV 几何形状的比较表明,我们的规范和基于生物学的空间有很大的不同。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4568/7884381/352f6464318b/285_2021_1553_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验