Lee Sangmin, Sim Byeongsu, Ye Jong Chul
Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea.
Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea.
Neural Netw. 2024 Oct;178:106435. doi: 10.1016/j.neunet.2024.106435. Epub 2024 Jun 22.
Understanding the training dynamics of deep ReLU networks is a significant area of interest in deep learning. However, there remains a lack of complete elucidation regarding the weight vector dynamics, even for single ReLU neurons. To bridge this gap, our study delves into the training dynamics of the gradient flow w(t) for single ReLU neurons under the square loss, dissecting it into its magnitude ‖w(t)‖ and angle φ(t) components. Through this decomposition, we establish upper and lower bounds on these components to elucidate the convergence dynamics. Furthermore, we demonstrate the empirical extension of our findings to general two-layer multi-neuron networks. All theoretical results are generalized to the gradient descent method and rigorously verified through experiments.
理解深度ReLU网络的训练动态是深度学习中一个重要的研究领域。然而,即使对于单个ReLU神经元,关于权重向量动态仍缺乏完整的阐释。为了弥补这一差距,我们的研究深入探讨了在平方损失下单个ReLU神经元的梯度流w(t)的训练动态,将其分解为模长‖w(t)‖和角度φ(t)分量。通过这种分解,我们建立了这些分量的上下界以阐明收敛动态。此外,我们展示了将我们的发现经验性地扩展到一般的两层多神经元网络。所有理论结果都推广到梯度下降法,并通过实验进行了严格验证。