Sahs Justin, Pyle Ryan, Damaraju Aneel, Caro Josue Ortega, Tavaslioglu Onur, Lu Andy, Anselmi Fabio, Patel Ankit B
Department of Neuroscience, Baylor College of Medicine, Houston, TX, United States.
Department of Electrical Engineering, Rice University, Houston, TX, United States.
Front Artif Intell. 2022 May 11;5:889981. doi: 10.3389/frai.2022.889981. eCollection 2022.
Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
神经网络(NNs)参数与所表示函数之间关系的不透明性阻碍了对其学习动态和归纳偏差的理解。部分原因在于神经网络参数化中固有的对称性,即多种不同的参数设置可导致相同的输出函数,这既造成了关系不明确,又产生了冗余的自由度。神经网络参数化在两种对称性下是不变的:神经元的排列以及权重和偏差参数尺度的连续变换族。我们建议对第二个对称群取商,并将ReLU神经网络重新参数化为连续分段线性样条。使用这个样条视角,我们研究了浅单变量ReLU神经网络中的学习动态,对几个令人困惑的现象有了意想不到的见解和解释。我们对损失曲面的结构形成了一个惊人简单且透明的视图,包括其临界点、不动点、海森矩阵和海森谱。我们还表明,标准的权重初始化会产生非常平坦的初始函数,并且这种平坦性与过参数化和初始权重尺度一起,决定了隐式正则化的强度和类型,这与之前的工作一致。我们的隐式正则化结果与近期工作互补,表明初始化尺度通过基于核的论证严格控制隐式正则化。总体而言,消除权重尺度对称性使我们能够更简单地证明这些结果,能够证明新的结果并获得新的见解同时提供一个更加透明和直观的图景。展望未来,我们基于商样条的方法将自然地扩展到多变量和深度设置,并且与基于核的视角一起,我们相信它将在理解神经网络的努力中发挥基础性作用。使用基于样条可视化的学习动态视频可在http://shorturl.at/tFWZ2获取。