Edelsbrunner Herbert, Ölsböck Katharina, Wagner Hubert
ISTA (Institute of Science and Technology Austria), 3400 Klosterneuburg, Austria.
Department of Mathematics, University of Florida, Gainesville, FL 32611, USA.
Entropy (Basel). 2024 Jul 27;26(8):637. doi: 10.3390/e26080637.
Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback-Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics.
拓扑数据分析中使用的方法自然地捕捉了嵌入在度量空间中的点云数据中的高阶相互作用。这种方法最近被扩展到存在于信息空间中的数据,这里的信息空间是指用信息理论距离度量的空间。一种这样的设置是嵌入在概率单纯形中的离散概率分布的有限集合,用相对熵(库尔贝克 - 莱布勒散度)来度量。更一般地,可以使用由不同熵概念参数化的布雷格曼散度。虽然针对这种设置存在理论算法,但用于探索和比较各种信息空间的几何 - 拓扑性质的实现却很少。因此,这项工作的意义有两个方面。首先,我们提出了首个用于信息空间中几何和拓扑数据分析的稳健算法和软件。也许令人惊讶的是,尽管使用布雷格曼散度,但我们的设计复用了欧几里得情形下的稳健库。其次,使用新软件,我们朝着理解这些空间的几何 - 拓扑结构迈出了第一步。特别是,我们将它们与配备欧几里得度量和费希尔度量的更熟悉的空间进行比较。