Psychology Department, Yale University, New Haven, CT, USA.
Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA.
Nat Commun. 2021 Apr 6;12(1):2065. doi: 10.1038/s41467-021-22244-7.
Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs' impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.
卷积神经网络(CNN)由于其具有较高的目标分类能力,并且与人类大脑的反应具有普遍的对应关系,因此越来越多地被用于模拟人类视觉。在这里,我们使用表示相似性分析来评估 14 种不同的 CNN 与人类 fMRI 对自然和人工图像的反应的性能。尽管存在一些 CNN 与大脑的对应关系,并且 CNN 能够令人印象深刻地完全捕捉现实世界物体的较低层次的视觉表示,但我们表明,CNN 既不能完全捕捉现实世界物体的较高层次的视觉表示,也不能完全捕捉人工物体的较高或较低层次的视觉表示。后者尤为关键,因为对真实世界和人工视觉刺激的处理都涉及相同的神经回路。我们报告了类似的结果,无论 CNN 架构、训练或是否存在递归处理的差异如何。这表明大脑和 CNN 表示视觉信息的方式存在一些根本性的差异。