Alamia Andrea, Luo Canhuang, Ricci Matthew, Kim Junkyung, Serre Thomas, VanRullen Rufin
CerCo, Centre National de la Recherche Scientifique Université de Toulouse, Toulouse 31055, France.
Department of Cognitive, Linguistic and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI 02912.
eNeuro. 2021 Jan 28;8(1). doi: 10.1523/ENEURO.0267-20.2020. Print 2021 Jan-Feb.
The development of deep convolutional neural networks (CNNs) has recently led to great successes in computer vision, and CNNs have become de facto computational models of vision. However, a growing body of work suggests that they exhibit critical limitations on tasks beyond image categorization. Here, we study one such fundamental limitation, concerning the judgment of whether two simultaneously presented items are the same or different (SD) compared with a baseline assessment of their spatial relationship (SR). In both human subjects and artificial neural networks, we test the prediction that SD tasks recruit additional cortical mechanisms which underlie critical aspects of visual cognition that are not explained by current computational models. We thus recorded electroencephalography (EEG) signals from human participants engaged in the same tasks as the computational models. Importantly, in humans the two tasks were matched in terms of difficulty by an adaptive psychometric procedure; yet, on top of a modulation of evoked potentials (EPs), our results revealed higher activity in the low β (16-24 Hz) band in the SD compared with the SR conditions. We surmise that these oscillations reflect the crucial involvement of additional mechanisms, such as working memory and attention, which are missing in current feed-forward CNNs.
深度卷积神经网络(CNN)的发展近来在计算机视觉领域取得了巨大成功,并且CNN已成为事实上的视觉计算模型。然而,越来越多的研究表明,它们在图像分类以外的任务上存在关键局限性。在此,我们研究其中一个基本局限性,即与对两个同时呈现项目的空间关系(SR)进行基线评估相比,判断这两个项目是否相同或不同(SD)。在人类受试者和人工神经网络中,我们测试了这样一个预测:SD任务会调用额外的皮层机制,这些机制构成了当前计算模型无法解释的视觉认知关键方面的基础。因此,我们记录了参与与计算模型相同任务的人类参与者的脑电图(EEG)信号。重要的是,在人类中,通过自适应心理测量程序使这两项任务在难度上相匹配;然而,除了诱发电位(EP)的调制外,我们的结果显示,与SR条件相比,SD条件下低β(16 - 24 Hz)频段的活动更高。我们推测,这些振荡反映了当前前馈CNN中所缺少的额外机制(如工作记忆和注意力)的关键参与。