Department of Psychology, Yale University, New Haven, CT, USA.
Department of Statistics & Data Science, Yale University, New Haven, CT, USA.
Nat Hum Behav. 2024 Feb;8(2):320-335. doi: 10.1038/s41562-023-01759-7. Epub 2023 Nov 23.
Many surface cues support three-dimensional shape perception, but humans can sometimes still see shape when these features are missing-such as when an object is covered with a draped cloth. Here we propose a framework for three-dimensional shape perception that explains perception in both typical and atypical cases as analysis-by-synthesis, or inference in a generative model of image formation. The model integrates intuitive physics to explain how shape can be inferred from the deformations it causes to other objects, as in cloth draping. Behavioural and computational studies comparing this account with several alternatives show that it best matches human observers (total n = 174) in both accuracy and response times, and is the only model that correlates significantly with human performance on difficult discriminations. We suggest that bottom-up deep neural network models are not fully adequate accounts of human shape perception, and point to how machine vision systems might achieve more human-like robustness.
许多表面线索支持三维形状感知,但即使这些特征缺失,人类有时仍然能够看到形状,例如当物体被一块布覆盖时。在这里,我们提出了一个三维形状感知框架,该框架将典型和非典型情况下的感知解释为分析综合,或在图像形成的生成模型中的推断。该模型集成了直观的物理知识,以解释如何从它对其他物体造成的变形中推断出形状,就像在布料褶皱中一样。行为和计算研究将该解释与其他几种解释进行比较,表明它在准确性和响应时间方面与人类观察者(总计 174 人)最匹配,并且是唯一与人类在困难辨别任务中的表现显著相关的模型。我们认为,自下而上的深度神经网络模型并不是对人类形状感知的充分解释,并指出机器视觉系统如何实现更像人类的鲁棒性。