Department of Psychology, University of Bristol, Bristol, BS8 1TL, United Kingdom.
Department of Psychology, University of Bristol, Bristol, BS8 1TL, United Kingdom.
Neural Netw. 2022 Jun;150:222-236. doi: 10.1016/j.neunet.2022.02.017. Epub 2022 Mar 5.
Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes. This extended to a different dataset of photographs of real objects. We also show that these invariances can be acquired in a self-supervised way, through solving the same/different task. We suggest that this latter approach may be similar to how humans acquire invariances.
人类可以识别经过各种空间变换(如比例和视角)的物体。这种能力扩展到了新的物体,即在单个姿势下只呈现一次后,有时被称为在线不变性。CNN 被提出作为人类视觉的一种有说服力的模型,但它们在变换下识别物体的能力通常是在经过大量数据增强后,在训练类别之外的样本上进行测试的。本文通过训练模型来识别经过几种变换(旋转、缩放、平移、亮度、对比度和视角)的合成 3D 物体的图像,评估标准 CNN 是否能够支持类似于人类的在线不变性。通过对模型内部表示的分析,我们表明,即使在从 10 个类别中仅训练 50 个对象的情况下,经过变换的物体上训练的标准监督 CNN 也可以对新类别的对象获得很强的不变性。这一结果扩展到了真实物体照片的另一个数据集。我们还表明,通过解决相同/不同的任务,这些不变性可以通过自监督的方式获得。我们认为,这种方法可能类似于人类获得不变性的方式。