School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.
School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.
Neural Netw. 2022 Apr;148:96-110. doi: 10.1016/j.neunet.2021.12.005. Epub 2021 Dec 17.
Deep Convolutional Neural Networks (DNNs) have achieved superhuman accuracy on standard image classification benchmarks. Their success has reignited significant interest in their use as models of the primate visual system, bolstered by claims of their architectural and representational similarities. However, closer scrutiny of these models suggests that they rely on various forms of shortcut learning to achieve their impressive performance, such as using texture rather than shape information. Such superficial solutions to image recognition have been shown to make DNNs brittle in the face of more challenging tests such as noise-perturbed or out-of-distribution images, casting doubt on their similarity to their biological counterparts. In the present work, we demonstrate that adding fixed biological filter banks, in particular banks of Gabor filters, helps to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerance to noise. Importantly, they also gained around 20-35% improved accuracy when generalising to our novel out-of-distribution test image sets over standard end-to-end trained architectures. We take these findings to suggest that these properties of the primate visual system should be incorporated into DNNs to make them more able to cope with real-world vision and better capture some of the more impressive aspects of human visual perception such as generalisation.
深度卷积神经网络 (DNN) 在标准图像分类基准测试中取得了超人的准确性。它们的成功重新激发了人们对将其用作灵长类视觉系统模型的浓厚兴趣,这一说法得到了它们在架构和表示方面相似性的支持。然而,对这些模型的更仔细审查表明,它们依赖于各种形式的捷径学习来实现令人印象深刻的性能,例如使用纹理而不是形状信息。这种对图像识别的表面解决方案已被证明使 DNN 在面对更具挑战性的测试(例如噪声干扰或分布外图像)时变得脆弱,这使人对它们与生物对应物的相似性产生了怀疑。在本工作中,我们证明了添加固定的生物滤波器组(特别是 Gabor 滤波器组)有助于限制网络避免依赖捷径,从而使它们开发出更具结构的内部表示,并对噪声具有更高的容忍度。重要的是,与标准的端到端训练架构相比,当它们推广到我们新颖的分布外测试图像集时,它们的准确性也提高了约 20-35%。我们认为这些灵长类视觉系统的特性应该被纳入 DNN 中,以使它们更能够应对现实世界的视觉,并更好地捕捉人类视觉感知的一些更令人印象深刻的方面,例如泛化。