Institute of Psychology, University of Tartu, Näituse 2, 50409, Tartu, Estonia.
Atten Percept Psychophys. 2024 Jan;86(1):9-15. doi: 10.3758/s13414-023-02697-2. Epub 2023 Mar 28.
Recently, Zhang et al. (Nature communications, 9(1), 3730, 2018) proposed an interesting model of attention guidance that uses visual features learnt by convolutional neural networks (CNNs) for object classification. I adapted this model for search experiments, with accuracy as the measure of performance. Simulation of our previously published feature and conjunction search experiments revealed that the CNN-based search model proposed by Zhang et al. considerably underestimates human attention guidance by simple visual features. Using target-distractor differences instead of target features for attention guidance or computing attention map at lower layers of the network could improve the performance. Still, the model fails to reproduce qualitative regularities of human visual search. The most likely explanation is that standard CNNs that are trained on image classification have not learnt medium- or high-level features required for human-like attention guidance.
最近,Zhang 等人(自然通讯,9(1),3730,2018)提出了一种有趣的注意力引导模型,该模型使用卷积神经网络(CNNs)学习的视觉特征进行目标分类。我将这个模型应用于搜索实验中,以准确性作为性能的衡量标准。我们之前发表的特征和联合搜索实验的模拟表明,Zhang 等人提出的基于 CNN 的搜索模型大大低估了简单视觉特征对人类注意力的引导。使用目标-干扰物差异而不是目标特征来引导注意力,或者在网络的较低层计算注意力图,可以提高性能。然而,该模型未能再现人类视觉搜索的定性规律。最有可能的解释是,在图像分类上训练的标准 CNN 尚未学习到用于人类般注意力引导的中高级特征。