Jozwik Kamila M, Kriegeskorte Nikolaus, Storrs Katherine R, Mur Marieke
Neural Dynamics of Visual Cognition, Department of Education and Psychology, Free University of Berlin, Berlin, Germany.
Memory and Perception Group, MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom.
Front Psychol. 2017 Oct 9;8:1726. doi: 10.3389/fpsyg.2017.01726. eCollection 2017.
Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human behavior on complex cognitive tasks. Recent reports suggest that DNNs can explain significant variance in one such task, judging object similarity. Here, we extend these findings by replicating them for a rich set of object images, comparing performance across layers within two DNNs of different depths, and examining how the DNNs' performance compares to that of non-computational "conceptual" models. Human observers performed similarity judgments for a set of 92 images of real-world objects. Representations of the same images were obtained in each of the layers of two DNNs of different depths (8-layer AlexNet and 16-layer VGG-16). To create conceptual models, other human observers generated visual-feature labels (e.g., "eye") and category labels (e.g., "animal") for the same image set. Feature labels were divided into parts, colors, textures and contours, while category labels were divided into subordinate, basic, and superordinate categories. We fitted models derived from the features, categories, and from each layer of each DNN to the similarity judgments, using representational similarity analysis to evaluate model performance. In both DNNs, similarity within the last layer explains most of the explainable variance in human similarity judgments. The last layer outperforms almost all feature-based models. Late and mid-level layers outperform some but not all feature-based models. Importantly, categorical models predict similarity judgments significantly better than any DNN layer. Our results provide further evidence for commonalities between DNNs and brain representations. Models derived from visual features other than object parts perform relatively poorly, perhaps because DNNs more comprehensively capture the colors, textures and contours which matter to human object perception. However, categorical models outperform DNNs, suggesting that further work may be needed to bring high-level semantic representations in DNNs closer to those extracted by humans. Modern DNNs explain similarity judgments remarkably well considering they were not trained on this task, and are promising models for many aspects of human cognition.
深度卷积神经网络(DNN)的最新进展已经实现了前所未有的精确大脑表征计算模型,并为模拟多种认知功能提供了一个令人兴奋的机会。最先进的DNN在物体分类方面达到了人类水平的性能,但它们在复杂认知任务中对人类行为的捕捉程度尚不清楚。最近的报告表明,DNN可以解释一项此类任务(判断物体相似性)中的显著差异。在这里,我们通过对一组丰富的物体图像重复这些发现、比较两个不同深度的DNN各层之间的性能以及研究DNN的性能与非计算性“概念性”模型的性能对比,来扩展这些发现。人类观察者对一组92张真实世界物体的图像进行相似性判断。在两个不同深度的DNN(8层AlexNet和16层VGG - 16)的每一层中获取相同图像的表征。为了创建概念性模型,其他人类观察者为同一图像集生成视觉特征标签(例如“眼睛”)和类别标签(例如“动物”)。特征标签分为部分、颜色、纹理和轮廓,而类别标签分为从属、基本和上级类别。我们使用表征相似性分析来评估模型性能,将从特征、类别以及每个DNN的每一层导出的模型拟合到相似性判断上。在两个DNN中,最后一层内的相似性解释了人类相似性判断中大部分可解释的差异。最后一层的表现优于几乎所有基于特征的模型。中晚期层的表现优于一些但并非所有基于特征的模型。重要的是,类别模型对相似性判断的预测明显优于任何DNN层。我们的结果为DNN与大脑表征之间的共性提供了进一步的证据。从物体部分以外的视觉特征导出的模型表现相对较差,可能是因为DNN更全面地捕捉了对人类物体感知重要的颜色、纹理和轮廓。然而,类别模型的表现优于DNN,这表明可能需要进一步的工作来使DNN中的高级语义表征更接近人类提取的表征。考虑到现代DNN并非针对此任务进行训练,它们对相似性判断的解释非常出色,并且是人类认知许多方面的有前途的模型。