Department of Physiology and Biophysics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America.
Department of Computing Science, University of Alberta, Edmonton, AB, Canada.
Neural Netw. 2020 Nov;131:103-114. doi: 10.1016/j.neunet.2020.07.013. Epub 2020 Jul 29.
The current state-of-the-art object recognition algorithms, deep convolutional neural networks (DCNNs), are inspired by the architecture of the mammalian visual system, and are capable of human-level performance on many tasks. As they are trained for object recognition tasks, it has been shown that DCNNs develop hidden representations that resemble those observed in the mammalian visual system (Razavi and Kriegeskorte, 2014; Yamins and Dicarlo, 2016; Gu and van Gerven, 2015; Mcclure and Kriegeskorte, 2016). Moreover, DCNNs trained on object recognition tasks are currently among the best models we have of the mammalian visual system. This led us to hypothesize that teaching DCNNs to achieve even more brain-like representations could improve their performance. To test this, we trained DCNNs on a composite task, wherein networks were trained to: (a) classify images of objects; while (b) having intermediate representations that resemble those observed in neural recordings from monkey visual cortex. Compared with DCNNs trained purely for object categorization, DCNNs trained on the composite task had better object recognition performance and are more robust to label corruption. Interestingly, we found that neural data was not required for this process, but randomized data with the same statistical properties as neural data also boosted performance. While the performance gains we observed when training on the composite task vs the "pure" object recognition task were modest, they were remarkably robust. Notably, we observed these performance gains across all network variations we studied, including: smaller (CORNet-Z) vs larger (VGG-16) architectures; variations in optimizers (Adam vs gradient descent); variations in activation function (ReLU vs ELU); and variations in network initialization. Our results demonstrate the potential utility of a new approach to training object recognition networks, using strategies in which the brain - or at least the statistical properties of its activation patterns - serves as a teacher signal for training DCNNs.
当前最先进的目标识别算法,即深度卷积神经网络(DCNN),受到哺乳动物视觉系统结构的启发,在许多任务上都能达到人类水平的性能。由于它们是针对目标识别任务进行训练的,因此已经证明 DCNN 会开发出类似于哺乳动物视觉系统中观察到的隐藏表示形式(Razavi 和 Kriegeskorte,2014;Yamins 和 DiCarlo,2016;Gu 和 van Gerven,2015;McClure 和 Kriegeskorte,2016)。此外,针对目标识别任务进行训练的 DCNN 目前是我们对哺乳动物视觉系统最好的模型之一。这使我们假设,教导 DCNN 实现更类似大脑的表示形式可以提高它们的性能。为了验证这一点,我们在一项综合任务上对 DCNN 进行了训练,其中网络被训练为:(a)对物体图像进行分类;(b)同时具有类似于从猴子视觉皮层神经记录中观察到的中间表示形式。与专门针对物体分类进行训练的 DCNN 相比,在综合任务上进行训练的 DCNN 的物体识别性能更好,并且对标签损坏更具鲁棒性。有趣的是,我们发现此过程不需要神经数据,但是具有与神经数据相同统计特性的随机数据也可以提高性能。尽管我们在综合任务与“纯”物体识别任务的训练中观察到的性能提升幅度不大,但却非常稳健。值得注意的是,我们在所有研究的网络变体中都观察到了这些性能提升,包括:更小的(CORNet-Z)与更大的(VGG-16)架构;优化器的变化(Adam 与梯度下降);激活函数的变化(ReLU 与 ELU);以及网络初始化的变化。我们的结果表明,使用一种新的方法来训练物体识别网络具有潜在的实用性,该方法使用了大脑(或者至少是其激活模式的统计特性)作为训练 DCNN 的教师信号。