Li Lu, Thompson Connor, Henselman-Petrusek Gregory, Giusti Chad, Ziegelmeier Lori
Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States.
Department of Mathematics, Purdue University, West Lafayette, IN, United States.
Front Artif Intell. 2021 Oct 11;4:681117. doi: 10.3389/frai.2021.681117. eCollection 2021.
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in and therefore, are also solutions to a restricted optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.
持久同调类的循环代表可用于描述数据中的拓扑特征。然而,这些代表的非唯一性会产生模糊性,并可能导致对同一组类有许多不同的解释。解决这个问题的一种方法是根据在数据背景下有意义的某种度量来优化代表的选择。在这项工作中,我们研究了几种最小化优化程序在构建一维有理系数持久同调的同调循环基方面的有效性和计算成本,包括均匀加权和长度加权的边损失算法以及均匀加权和面积加权的三角形损失算法。我们通过标准线性规划方法进行这些优化,应用通用求解器在单纯形边界矩阵的列基上进行优化。我们的主要发现是:1)优化在减小循环代表的规模方面是有效的,尽管减小的程度根据基础数据的维度和分布而有所不同;2)在我们考虑的大多数数据集中,优化循环代表基的计算成本超过了计算这样一个基的成本;3)线性求解器的选择对优化循环的计算时间有很大影响;4)使用Gurobi线性求解器,对于大多数循环代表,求解整数规划的计算时间并不比求解线性规划的计算时间长得多;5)引人注目的是,无论是否要求整数解,我们几乎总是得到具有相同成本的解,并且找到的几乎所有解的元素都在 中,因此,它们也是一个受限优化问题的解;6)对于厄多斯 - 雷尼随机团复形中的生成元,我们得到的结果与真实世界和合成点云数据中的结果在性质上不同。