Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.
School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, People's Republic of China; Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.
Neural Netw. 2022 Nov;155:224-241. doi: 10.1016/j.neunet.2022.08.015. Epub 2022 Aug 23.
Visual properties that primarily attract bottom-up attention are collectively referred to as saliency. In this study, to understand the neural activity involved in top-down and bottom-up visual attention, we aim to prepare pairs of natural and unnatural images with common saliency. For this purpose, we propose an image transformation method based on deep neural networks that can generate new images while maintaining the consistent feature map, in particular the saliency map. This is an ill-posed problem because the transformation from an image to its corresponding feature map could be many-to-one, and in our particular case, the various images would share the same saliency map. Although stochastic image generation has the potential to solve such ill-posed problems, the most existing methods focus on adding diversity of the overall style/touch information while maintaining the naturalness of the generated images. To this end, we developed a new image transformation method that incorporates higher-dimensional latent variables so that the generated images appear unnatural with less context information but retain a high diversity of local image structures. Although such high-dimensional latent spaces are prone to collapse, we proposed a new regularization based on Kullback-Leibler divergence to avoid collapsing the latent distribution. We also conducted human experiments using our newly prepared natural and corresponding unnatural images to measure overt eye movements and functional magnetic resonance imaging, and found that those images induced distinctive neural activities related to top-down and bottom-up attentional processing.
主要吸引自下而上注意的视觉属性统称为显著度。在这项研究中,为了理解涉及自上而下和自下而上视觉注意的神经活动,我们旨在准备具有共同显著度的自然和非自然图像对。为此,我们提出了一种基于深度神经网络的图像转换方法,该方法可以在保持一致的特征图(特别是显著度图)的同时生成新图像。这是一个不适定问题,因为从图像到其相应特征图的转换可能是多对一的,并且在我们的特殊情况下,各种图像将共享相同的显著度图。尽管随机图像生成有可能解决此类不适定问题,但大多数现有方法都侧重于在保持生成图像自然性的同时增加整体风格/触感信息的多样性。为此,我们开发了一种新的图像转换方法,该方法结合了更高维的潜在变量,以使生成的图像在保留局部图像结构高度多样性的同时显得不自然,并且上下文信息较少。虽然这种高维潜在空间容易崩溃,但我们提出了一种新的基于 Kullback-Leibler 散度的正则化方法来避免潜在分布的崩溃。我们还使用我们新准备的自然和相应的非自然图像进行了人类实验,以测量显性眼动和功能磁共振成像,发现这些图像诱导了与自上而下和自下而上注意力处理相关的独特神经活动。