Suppr超能文献

用像素隐藏飞机:研究 CNN 中的形状偏差和构建生物约束的好处。

Hiding a plane with a pixel: examining shape-bias in CNNs and the benefit of building in biological constraints.

机构信息

School of Psychological, Science University of Bristol, Bristol BS8 1TU, UK.

School of Psychological, Science University of Bristol, Bristol BS8 1TU, UK.

出版信息

Vision Res. 2020 Sep;174:57-68. doi: 10.1016/j.visres.2020.04.013. Epub 2020 Jun 28.

Abstract

When deep convolutional neural networks (CNNs) are trained "end-to-end" on raw data, some of the feature detectors they develop in their early layers resemble the representations found in early visual cortex. This result has been used to draw parallels between deep learning systems and human visual perception. In this study, we show that when CNNs are trained end-to-end they learn to classify images based on whatever feature is predictive of a category within the dataset. This can lead to bizarre results where CNNs learn idiosyncratic features such as high-frequency noise-like masks. In the extreme case, our results demonstrate image categorisation on the basis of a single pixel. Such features are extremely unlikely to play any role in human object recognition, where experiments have repeatedly shown a strong preference for shape. Through a series of empirical studies with standard high-performance CNNs, we show that these networks do not develop a shape-bias merely through regularisation methods or more ecologically plausible training regimes. These results raise doubts over the assumption that simply learning end-to-end in standard CNNs leads to the emergence of similar representations to the human visual system. In the second part of the paper, we show that CNNs are less reliant on these idiosyncratic features when we forgo end-to-end learning and introduce hard-wired Gabor filters designed to mimic early visual processing in V1.

摘要

当深度卷积神经网络(CNNs)在原始数据上“端到端”进行训练时,它们在早期层中开发的一些特征探测器类似于早期视觉皮层中发现的表示。这一结果被用来将深度学习系统与人类视觉感知进行类比。在这项研究中,我们表明,当 CNN 端到端进行训练时,它们会根据数据集中可预测类别的特征来学习对图像进行分类。这可能导致奇怪的结果,例如 CNN 学会了独特的特征,如高频噪声状的掩模。在极端情况下,我们的结果证明了基于单个像素的图像分类。这种特征在人类物体识别中极不可能发挥任何作用,实验已经反复表明,形状具有强烈的偏好。通过一系列使用标准高性能 CNN 的实证研究,我们表明这些网络并没有仅仅通过正则化方法或更符合生态的训练模式形成形状偏差。这些结果使人们对仅仅通过标准 CNN 端到端学习就能产生类似于人类视觉系统的表示的假设产生了怀疑。在论文的第二部分,我们表明,当我们放弃端到端学习并引入旨在模仿 V1 中早期视觉处理的硬连线 Gabor 滤波器时,CNN 对这些独特特征的依赖程度会降低。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验