Kallmayer Aylin, Võ Melissa L-H
Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany.
Commun Psychol. 2024 Jul 26;2(1):68. doi: 10.1038/s44271-024-00119-z.
Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N = 50, N = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.
我们的视觉环境高度复杂。尽管如此,我们仍能毫不费力地理解并在其中导航。这需要将传入的感官信息转化为不仅涵盖低层次到高层次视觉特征(如边缘、物体部件、物体),而且可能还反映现实世界场景中物体共现统计信息的表征。在此,所谓的锚定物体被定义为能够高度预测经常共同出现(通常较小)物体的位置和身份,它源自现实世界场景中的物体聚类统计信息,而所谓的诊断物体则能预测更大的语义背景(即场景类别)。在两项研究(N = 50,N = 44)中,我们使用由生成对抗网络(GANs)生成的、自然沿这些维度变化的场景,从现实主义和分类这两个维度研究这些属性中哪些是场景理解的基础。我们表明,锚定物体以及从一系列预训练深度神经网络(DNN)中提取的主要高层次特征,在第一眼观察时以及初始处理后都推动了现实主义。分类性能在第一眼观察时以及初始处理后主要由诊断物体决定,而与现实主义无关。我们的结果证明了视觉系统能够捕捉可靠的、特定类别的信息来源,这些信息来源对视觉特征层次结构中的干扰具有灵活性。