中级视觉计算的概念框架。

A conceptual framework of computations in mid-level vision.

作者信息

Kubilius Jonas, Wagemans Johan, Op de Beeck Hans P

机构信息

Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium ; Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

出版信息

Front Comput Neurosci. 2014 Dec 12;8:158. doi: 10.3389/fncom.2014.00158. eCollection 2014.

DOI:10.3389/fncom.2014.00158

PMID:25566044

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4264474/

Abstract

If a picture is worth a thousand words, as an English idiom goes, what should those words-or, rather, descriptors-capture? What format of image representation would be sufficiently rich if we were to reconstruct the essence of images from their descriptors? In this paper, we set out to develop a conceptual framework that would be: (i) biologically plausible in order to provide a better mechanistic understanding of our visual system; (ii) sufficiently robust to apply in practice on realistic images; and (iii) able to tap into underlying structure of our visual world. We bring forward three key ideas. First, we argue that surface-based representations are constructed based on feature inference from the input in the intermediate processing layers of the visual system. Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features. The constructed surfaces may be partially overlapping to compensate for occlusions and are ordered in depth (figure-ground organization). Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands. Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

摘要

俗话说，一幅图胜过千言万语，那么这些文字——或者更确切地说，描述符——应该捕捉什么呢？如果我们要从图像的描述符中重建图像的本质，什么样的图像表示格式会足够丰富呢？在本文中，我们着手开发一个概念框架，该框架应：(i) 在生物学上合理，以便更好地从机制上理解我们的视觉系统；(ii) 足够稳健，能够在实际的现实图像中应用；(iii) 能够挖掘我们视觉世界的潜在结构。我们提出了三个关键思想。首先，我们认为基于表面的表示是基于视觉系统中间处理层中从输入进行的特征推断构建的。这种表示在很大程度上以预语义（在分类之前）和预注意的方式使用多种线索（方向、颜色、极性、方向变化等）进行计算，并明确保留特征之间的结构关系。构建的表面可能会部分重叠以补偿遮挡，并按深度排序（图底组织）。其次，我们提出这种中间表示可以通过对局部图像块中的特征之间的相似性进行分层计算以及对高度相似的单元进行池化来形成，并根据任务需求通过循环回路进行重新估计。最后，我们建议使用由逼真渲染的人造物体和表面组成的数据集，以便更好地理解模型的行为及其局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e9/4264474/40af779da129/fncom-08-00158-g0001.jpg

相似文献

A conceptual framework of computations in mid-level vision.

Front Comput Neurosci. 2014 Dec 12;8:158. doi: 10.3389/fncom.2014.00158. eCollection 2014.

3-D vision and figure-ground separation by visual cortex.

Percept Psychophys. 1994 Jan;55(1):48-121. doi: 10.3758/bf03206880.

Learning deep hierarchical visual feature coding.

IEEE Trans Neural Netw Learn Syst. 2014 Dec;25(12):2212-25. doi: 10.1109/TNNLS.2014.2307532.

Performance of a Computational Model of the Mammalian Olfactory System

What are the Visual Features Underlying Rapid Object Recognition?

Front Psychol. 2011 Nov 15;2:326. doi: 10.3389/fpsyg.2011.00326. eCollection 2011.

Globally consistent depth sorting of overlapping 2D surfaces in a model using local recurrent interactions.

Biol Cybern. 2008 Apr;98(4):305-37. doi: 10.1007/s00422-008-0211-7. Epub 2008 Mar 4.

[Study of the categorization process among patients with eating disorders: a new cognitive approach to psychopathology].

Encephale. 2005 Jan-Feb;31(1 Pt 1):82-91. doi: 10.1016/s0013-7006(05)82376-0.

fMRI Analysis-by-Synthesis Reveals a Dorsal Hierarchy That Extracts Surface Slant.

J Neurosci. 2015 Jul 8;35(27):9823-35. doi: 10.1523/JNEUROSCI.1255-15.2015.

Extraction of surface-related features in a recurrent model of V1-V2 interactions.

PLoS One. 2009 Jun 15;4(6):e5909. doi: 10.1371/journal.pone.0005909.

Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments.

Front Psychol. 2017 Oct 9;8:1726. doi: 10.3389/fpsyg.2017.01726. eCollection 2017.

引用本文的文献

Representation of Natural Contours by a Neural Population in Monkey V4.

eNeuro. 2024 Mar 15;11(3). doi: 10.1523/ENEURO.0445-23.2024. Print 2024 Mar.

Luminance texture boundaries and luminance step boundaries are segmented using different mechanisms.

Vision Res. 2022 Jan;190:107968. doi: 10.1016/j.visres.2021.107968. Epub 2021 Nov 15.

A Prototypical Template for Rapid Face Detection Is Embedded in the Monkey Superior Colliculus.

Front Syst Neurosci. 2020 Feb 6;14:5. doi: 10.3389/fnsys.2020.00005. eCollection 2020.

Modelling face memory reveals task-generalizable representations.

Nat Hum Behav. 2019 Aug;3(8):817-826. doi: 10.1038/s41562-019-0625-3. Epub 2019 Jun 17.

Common spatiotemporal processing of visual features shapes object representation.

Sci Rep. 2019 May 20;9(1):7601. doi: 10.1038/s41598-019-43956-3.

Foreground-Background Segmentation Revealed during Natural Image Viewing.

eNeuro. 2018 Jun 26;5(3). doi: 10.1523/ENEURO.0075-18.2018. eCollection 2018 May-Jun.

Three Kinds of Nonconceptual Seeing-as.

Rev Philos Psychol. 2017;8(4):763-779. doi: 10.1007/s13164-017-0339-2. Epub 2017 May 26.

Object segmentation controls image reconstruction from natural scenes.

PLoS Biol. 2017 Aug 21;15(8):e1002611. doi: 10.1371/journal.pbio.1002611. eCollection 2017 Aug.

Sensitivity to Nonaccidental Configurations of Two-Line Stimuli.

Iperception. 2017 Apr 3;8(2):2041669517699628. doi: 10.1177/2041669517699628. eCollection 2017 Mar-Apr.

Deep Neural Networks as a Computational Model for Human Shape Sensitivity.

PLoS Comput Biol. 2016 Apr 28;12(4):e1004896. doi: 10.1371/journal.pcbi.1004896. eCollection 2016 Apr.

本文引用的文献

Form-cue invariant second-order neuronal responses to contrast modulation in primate area V2.

J Neurosci. 2014 Sep 3;34(36):12081-92. doi: 10.1523/JNEUROSCI.0211-14.2014.

Local spectral anisotropy is a valid cue for figure-ground organization in natural scenes.

Vision Res. 2014 Oct;103:116-26. doi: 10.1016/j.visres.2014.08.012. Epub 2014 Aug 29.

Hierarchical representation of shapes in visual cortex-from localized features to figural shape segregation.

Front Comput Neurosci. 2014 Aug 11;8:93. doi: 10.3389/fncom.2014.00093. eCollection 2014.

Local edge statistics provide information regarding occlusion and nonocclusion edges in natural scenes.

J Vis. 2014 Aug 15;14(9):13. doi: 10.1167/14.9.13.

Encoding of configural regularity in the human visual system.

J Vis. 2014 Aug 13;14(9):11. doi: 10.1167/14.9.11.

The role of visual area V4 in the discrimination of partially occluded shapes.

J Neurosci. 2014 Jun 18;34(25):8570-84. doi: 10.1523/JNEUROSCI.1375-14.2014.

Modeling visual clutter perception using proto-object segmentation.

J Vis. 2014 Jun 5;14(7):4. doi: 10.1167/14.7.4.

Performance-optimized hierarchical models predict neural responses in higher visual cortex.

Proc Natl Acad Sci U S A. 2014 Jun 10;111(23):8619-24. doi: 10.1073/pnas.1403112111. Epub 2014 May 8.

How biological vision succeeds in the physical world.

Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4750-5. doi: 10.1073/pnas.1311309111. Epub 2014 Mar 17.

Responses to orientation discontinuities in V1 and V2: physiological dissociations and functional implications.

J Neurosci. 2014 Mar 5;34(10):3559-78. doi: 10.1523/JNEUROSCI.2293-13.2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

中级视觉计算的概念框架。

A conceptual framework of computations in mid-level vision.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献