Wallis Thomas S A, Funke Christina M, Ecker Alexander S, Gatys Leon A, Wichmann Felix A, Bethge Matthias
Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, and the Bernstein Center for Computational Neuroscience, Tübingen, Germany.
Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, and Bernstein Center for Computational Neuroscience, Tübingen, Germany, and Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
J Vis. 2017 Oct 1;17(12):5. doi: 10.1167/17.12.5.
Our visual environment is full of texture-"stuff" like cloth, bark, or gravel as distinct from "things" like dresses, trees, or paths-and humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery ("parafoveal") and when observers were able to make eye movements to all three patches ("inspection"). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler Portilla and Simoncelli model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the Portilla and Simoncelli model (nine compared to four textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesize textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.
我们的视觉环境充满了纹理——如布料、树皮或砾石等“物质”,有别于连衣裙、树木或小径等“物体”——而人类善于感知物质属性的细微变化。为了研究对纹理感知重要的图像特征,我们通过心理物理学方法,将一种最近的纹理外观参数模型(卷积神经网络[CNN]模型,该模型使用深度CNN[VGG-19]编码的特征)与另外两种模型进行比较:著名的波蒂利亚和西蒙切利模型,以及CNN模型的一种扩展模型,其中额外匹配了功率谱。观察者在两种观察条件下,通过空间三择一奇异性范式,区分模型生成的纹理和原始自然纹理:当测试补丁短暂呈现给近周边视野(“中央凹旁”)时,以及当观察者能够对所有三个补丁进行眼动(“检查”)时。在中央凹旁观察条件下,观察者无法区分12幅原始图像中的10幅与CNN模型图像,而且值得注意的是,更简单的波蒂利亚和西蒙切利模型的表现略优于CNN模型(11种纹理)。在中央凹检查条件下,匹配的CNN特征比波蒂利亚和西蒙切利模型更能捕捉外观(9种纹理对4种纹理),并且包括功率谱在内,对剩余三种纹理中的两种改善了外观匹配。在检查条件下,我们测试的这些模型中没有一个能够为12种纹理中的一种生成无法区分的图像。虽然深度CNN(VGG-19)特征通常可用于合成人类无法与自然纹理区分的纹理,但目前还没有适用于所有纹理和观察条件的统一最佳模型。