Zhu Hongru, Yuille Alan, Kersten Daniel
Department of Cognitive Science, Johns Hopkins University.
Department of Psychology, University of Minnesota Twin Cities.
Cogsci. 2021 Jul;43:223-229.
Perceiving 3D structure in natural images is an immense computational challenge for the visual system. While many previous studies focused on the perception of rigid 3D objects, we applied a novel method on a common set of non-rigid objects-static images of the human body in the natural world. We investigated to what extent human ability to interpret 3D poses in natural images depends on the typicality of the underlying 3D pose and the informativeness of the viewpoint. Using a novel 2AFC pose matching task, we measured how well subjects were able to match a target natural pose image with one of two comparison, synthetic body images from a different viewpoint-one was rendered with the same 3D pose parameters as the target while the other was a distractor rendered with added noises on joint angles. We found that performance for typical poses was measurably better than atypical poses; however, we found no significant difference between informative and less informative viewpoints. Further comparisons of 2D and 3D pose matching models on the same task showed that 3D body knowledge is particularly important when interpreting images of atypical poses. These results suggested that human ability to interpret 3D poses depends on pose typicality but not viewpoint informativeness, and that humans probably use prior knowledge of 3D pose structures.
在自然图像中感知三维结构对视觉系统来说是一项巨大的计算挑战。尽管之前许多研究聚焦于刚性三维物体的感知,但我们将一种新方法应用于一组常见的非刚性物体——自然界中人体的静态图像。我们研究了人类在自然图像中解读三维姿势的能力在多大程度上取决于潜在三维姿势的典型性和视角的信息量。使用一种新颖的二选一姿势匹配任务,我们测量了受试者将目标自然姿势图像与两个比较图像之一(来自不同视角的合成人体图像)进行匹配的能力,其中一个比较图像是使用与目标相同的三维姿势参数渲染的,而另一个是在关节角度上添加了噪声的干扰项。我们发现典型姿势的表现明显优于非典型姿势;然而,我们发现信息量丰富和信息量少的视角之间没有显著差异。在同一任务上对二维和三维姿势匹配模型的进一步比较表明,在解读非典型姿势图像时,三维人体知识尤为重要。这些结果表明,人类解读三维姿势的能力取决于姿势典型性而非视角信息量,并且人类可能会利用三维姿势结构的先验知识。