J Dominé Clémentine C, Braun Lukas, Fitzgerald James E, Saxe Andrew M
Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom.
Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.
J Stat Mech. 2023 Nov 1;2023(11):114004. doi: 10.1088/1742-5468/ad01b8. Epub 2023 Nov 15.
Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
众所周知,深度神经网络中的学习严重依赖于初始网络权重中嵌入的知识。然而,很少有理论结果能将先验知识与学习动态精确联系起来。在这里,我们通过推广Fukumizu的矩阵Riccati解(Fukumizu,1998 1E - 03),得出了深度线性网络中具有丰富先验知识的学习动态的精确解。对于广泛的初始化和任务类别,我们获得了关于不断演化的网络函数、隐藏表示相似性以及训练过程中的神经切线核的显式表达式。这些表达式揭示了一类与任务无关的初始化,它们从缓慢的非线性动态根本改变学习动态,转变为快速指数轨迹,同时以相同的表示相似性收敛到全局最优,使学习轨迹与初始内部表示的结构分离。我们刻画了网络权重如何与任务结构动态对齐,严格证明了为什么先前的解在不考虑其精细尺度结构的情况下成功描述了从小初始权重开始的学习。最后,我们讨论了这些发现对持续学习、逆向学习和结构化知识学习的影响。综合来看,我们的结果为理解先验知识对深度学习的影响提供了一个数学工具包。