School of Psychological Sciences, University of Bristol, Bristol, United Kingdom.
PLoS Comput Biol. 2022 May 13;18(5):e1009572. doi: 10.1371/journal.pcbi.1009572. eCollection 2022 May.
Humans rely heavily on the shape of objects to recognise them. Recently, it has been argued that Convolutional Neural Networks (CNNs) can also show a shape-bias, provided their learning environment contains this bias. This has led to the proposal that CNNs provide good mechanistic models of shape-bias and, more generally, human visual processing. However, it is also possible that humans and CNNs show a shape-bias for very different reasons, namely, shape-bias in humans may be a consequence of architectural and cognitive constraints whereas CNNs show a shape-bias as a consequence of learning the statistics of the environment. We investigated this question by exploring shape-bias in humans and CNNs when they learn in a novel environment. We observed that, in this new environment, humans (i) focused on shape and overlooked many non-shape features, even when non-shape features were more diagnostic, (ii) learned based on only one out of multiple predictive features, and (iii) failed to learn when global features, such as shape, were absent. This behaviour contrasted with the predictions of a statistical inference model with no priors, showing the strong role that shape-bias plays in human feature selection. It also contrasted with CNNs that (i) preferred to categorise objects based on non-shape features, and (ii) increased reliance on these non-shape features as they became more predictive. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results suggest that shape-bias has a different source in humans and CNNs: while learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects.
人类主要依靠物体的形状来识别物体。最近,有人认为卷积神经网络(CNN)也可以表现出形状偏向,只要它们的学习环境中存在这种偏向。这导致了一种观点,即 CNN 提供了形状偏向和更一般的人类视觉处理的良好机制模型。然而,人类和 CNN 表现出形状偏向的原因也可能非常不同,即人类的形状偏向可能是由于结构和认知限制所致,而 CNN 表现出形状偏向则是由于学习环境的统计数据所致。我们通过在新环境中探索人类和 CNN 的形状偏向来研究这个问题。我们观察到,在这种新环境中,人类(i)专注于形状而忽略了许多非形状特征,即使非形状特征更具诊断性,(ii)仅根据多个预测特征中的一个进行学习,(iii)当全局特征(例如形状)不存在时无法学习。这种行为与没有先验的统计推断模型的预测形成对比,表明形状偏向在人类特征选择中起着重要作用。它还与 CNN 形成对比,(i)更愿意基于非形状特征对物体进行分类,(ii)随着这些非形状特征变得更具预测性,它们对这些非形状特征的依赖度增加。即使 CNN 经过预训练具有形状偏向并且卷积骨干网络被冻结,也是如此。这些结果表明,人类和 CNN 中的形状偏向有不同的来源:在 CNN 中,学习是由环境的统计特性驱动的,而人类则受到其先前偏见的高度限制,这表明认知限制在人类学习识别新物体的方式中起着关键作用。