Faculty of Science, Toho University, Miyama 2-2-1, Funabashi, Chiba 274-8510
School of Science and Engineering, Tokyo Denki University, Ishizaka, Hatoyama-machi, Hiki-gun, Saitama 350-0394.
eNeuro. 2021 Feb 9;8(1). doi: 10.1523/ENEURO.0200-20.2020. Print 2021 Jan-Feb.
Attentional selection is a function that allocates the brain's computational resources to the most important part of a visual scene at a specific moment. Saliency map models have been proposed as computational models to predict attentional selection within a spatial location. Recent saliency map models based on deep convolutional neural networks (DCNNs) exhibit the highest performance for predicting the location of attentional selection and human gaze, which reflect overt attention. Trained DCNNs potentially provide insight into the perceptual mechanisms of biological visual systems. However, the relationship between artificial and neural representations used for determining attentional selection and gaze location remains unknown. To understand the mechanism underlying saliency map models based on DCNNs and the neural system of attentional selection, we investigated the correspondence between layers of a DCNN saliency map model and monkey visual areas for natural image representations. We compared the characteristics of the responses in each layer of the model with those of the neural representation in the primary visual (V1), intermediate visual (V4), and inferior temporal (IT) cortices. Regardless of the DCNN layer level, the characteristics of the responses were consistent with that of the neural representation in V1. We found marked peaks of correspondence between V1 and the early level and higher-intermediate-level layers of the model. These results provide insight into the mechanism of the trained DCNN saliency map model and suggest that the neural representations in V1 play an important role in computing the saliency that mediates attentional selection, which supports the V1 saliency hypothesis.
注意选择是一种功能,它在特定时刻将大脑的计算资源分配给视觉场景中最重要的部分。显著图模型已被提出作为计算模型,以预测空间位置内的注意选择。最近基于深度卷积神经网络 (DCNN) 的显著图模型在预测注意选择和人类注视的位置方面表现出最高的性能,这反映了显性注意。经过训练的 DCNN 可能为理解生物视觉系统的感知机制提供了线索。然而,用于确定注意选择和注视位置的人工和神经表示之间的关系尚不清楚。为了理解基于 DCNN 的显著图模型和注意选择的神经系统的机制,我们研究了 DCNN 显著图模型的各层与猴子视觉区域之间的对应关系,用于自然图像表示。我们比较了模型中每层的响应特征与初级视觉 (V1)、中间视觉 (V4) 和下颞叶 (IT) 皮层的神经表示的特征。无论 DCNN 层的水平如何,响应的特征都与 V1 中的神经表示一致。我们发现 V1 与模型的早期和中高级层之间存在明显的对应峰值。这些结果深入了解了经过训练的 DCNN 显著图模型的机制,并表明 V1 中的神经表示在计算介导注意选择的显著性方面起着重要作用,这支持了 V1 显著性假说。