MRC Cognition and Brain Sciences Unit, University of Cambridge, CB2 7EF Cambridge, United Kingdom.
Department of Psychology, Zuckerman Institute, Columbia University, New York, NY 10027.
Proc Natl Acad Sci U S A. 2021 Feb 23;118(8). doi: 10.1073/pnas.2011417118.
Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce , a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.
深度神经网络提供了灵长类动物大脑中视觉信息处理的当前最佳模型。借鉴计算机视觉的研究成果,最常用的网络是在 ImageNet 大规模视觉识别挑战赛的数据上进行预训练的。该数据集包含来自 1000 个类别的图像,旨在为自动化视觉对象识别系统提供一个具有挑战性的测试平台。超越这一常见做法,我们引入了一个由超过 150 万张来自 565 个基本类别图像组成的数据集,旨在更好地捕捉与人类相关的物体分布。Ecoset 类别是根据其在语言使用中的频率和具体性来选择的,从而反映了世界上重要的物理物体。我们使用两种神经网络架构(AlexNet 和 vNet)的多个实例来测试在这个更具生态有效性的数据集上进行训练的效果,vNet 是一种新的架构,旨在模拟人类腹侧流中感受野大小的逐渐增加。我们发现,在 Ecoset 上进行训练可以显著提高对人类高级视觉皮层和感知判断的表示预测,超过了以前的艺术水平。这两种架构在两个独立的功能磁共振成像 (fMRI) 数据集和行为数据上都表现出显著和高度一致的优势,共同涵盖了来自各种物体类别的 1292 个视觉刺激的反应。这些结果表明,计算视觉神经科学可以通过使用反映人类感知和认知体验的图像集,更好地利用深度学习框架。Ecoset 和训练后的网络模型可供研究界公开使用。