Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA.
Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA.
Phys Rev E. 2021 Feb;103(2-1):022404. doi: 10.1103/PhysRevE.103.022404.
Many sensory pathways in the brain include sparsely active populations of neurons downstream from the input stimuli. The biological purpose of this expanded structure is unclear, but it may be beneficial due to the increased expressive power of the network. In this work, we show that certain ways of expanding a neural network can improve its generalization performance even when the expanded structure is pruned after the learning period. To study this setting, we use a teacher-student framework where a perceptron teacher network generates labels corrupted with small amounts of noise. We then train a student network structurally matched to the teacher. In this scenario, the student can achieve optimal accuracy if given the teacher's synaptic weights. We find that sparse expansion of the input layer of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when the expansion is pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean-field limit. By solving the mean-field equations, we show that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. This improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.
大脑中的许多感觉通路包括输入刺激后下游稀疏活跃的神经元群体。这种扩展结构的生物学目的尚不清楚,但由于网络的表达能力增强,它可能是有益的。在这项工作中,我们表明,即使在学习期后修剪扩展结构,扩展神经网络的某些方法也可以提高其泛化性能。为了研究这种情况,我们使用了一种师生框架,其中感知器教师网络生成带有少量噪声污染的标签。然后,我们训练与教师结构匹配的学生网络。在这种情况下,如果学生获得了教师的突触权重,它可以达到最佳的准确性。我们发现,当学生感知器网络的输入层稀疏扩展时,当从感知器教师学习带有噪声的规则时,扩展在学习后被修剪,可以提高网络的容量和泛化性能。当扩展单元是随机的且与输入无关时,我们发现了类似的行为,并在平均场极限下分析了这个网络。通过求解平均场方程,我们表明,即使相对于其试图学习的教师,学生网络的复杂性增加,随机扩展的学生网络的泛化误差仍继续下降。尽管相对于它试图学习的教师,学生网络的复杂性增加,但泛化性能的这种提高确实发生了。我们表明,这种效应与人工神经网络中的松弛变量的添加密切相关,并为人工和生物神经网络提出了可能的启示。