Baptista Anthony, Barp Alessandro, Chakraborti Tapabrata, Harbron Chris, MacArthur Ben D, Banerji Christopher R S
The Alan Turing Institute, The British Library, London, NW1 2DB, UK.
School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK.
Sci Rep. 2024 Oct 8;14(1):23383. doi: 10.1038/s41598-024-74045-9.
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. It is known that data passing through a trained DNN classifier undergoes a series of geometric and topological simplifications. While some progress has been made toward understanding these transformations in neural networks with smooth activation functions, an understanding in the more general setting of non-smooth activation functions, such as the rectified linear unit (ReLU), which tend to perform better, is required. Here we propose that the geometric transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow-a tool from differential geometry that evolves a manifold by smoothing its curvature, in order to identify its topology. To illustrate this idea, we present a computational framework to quantify the geometric changes that occur as data passes through successive layers of a DNN, and use this framework to motivate a notion of 'global Ricci network flow' that can be used to assess a DNN's ability to disentangle complex data geometries to solve classification problems. By training more than 1500 DNN classifiers of different widths and depths on synthetic and real-world data, we show that the strength of global Ricci network flow-like behaviour correlates with accuracy for well-trained DNNs, independently of depth, width and data set. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
深度神经网络(DNN)是用于逼近复杂数据分布的强大工具。众所周知,经过训练的DNN分类器处理的数据会经历一系列几何和拓扑简化。虽然在理解具有平滑激活函数的神经网络中的这些变换方面已经取得了一些进展,但在更一般的非平滑激活函数设置中,例如往往表现更好的整流线性单元(ReLU),仍需要深入理解。在这里,我们提出,DNN在分类任务期间执行的几何变换与哈密顿 Ricci 流(一种来自微分几何的工具,通过平滑流形的曲率来演化流形,以识别其拓扑)下预期的变换相似。为了说明这一想法,我们提出了一个计算框架,用于量化数据通过DNN的连续层时发生的几何变化,并使用这个框架来激发“全局 Ricci 网络流”的概念,该概念可用于评估DNN解开复杂数据几何结构以解决分类问题的能力。通过在合成数据和真实世界数据上训练1500多个不同宽度和深度的DNN分类器,我们表明,类似全局 Ricci 网络流行为的强度与训练良好的DNN的准确率相关,与深度、宽度和数据集无关。我们的研究结果促使人们将微分几何和离散几何的工具应用于深度学习的可解释性问题。