Suppr超能文献

中级视觉计算的概念框架。

A conceptual framework of computations in mid-level vision.

作者信息

Kubilius Jonas, Wagemans Johan, Op de Beeck Hans P

机构信息

Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium ; Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

出版信息

Front Comput Neurosci. 2014 Dec 12;8:158. doi: 10.3389/fncom.2014.00158. eCollection 2014.

Abstract

If a picture is worth a thousand words, as an English idiom goes, what should those words-or, rather, descriptors-capture? What format of image representation would be sufficiently rich if we were to reconstruct the essence of images from their descriptors? In this paper, we set out to develop a conceptual framework that would be: (i) biologically plausible in order to provide a better mechanistic understanding of our visual system; (ii) sufficiently robust to apply in practice on realistic images; and (iii) able to tap into underlying structure of our visual world. We bring forward three key ideas. First, we argue that surface-based representations are constructed based on feature inference from the input in the intermediate processing layers of the visual system. Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features. The constructed surfaces may be partially overlapping to compensate for occlusions and are ordered in depth (figure-ground organization). Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands. Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

摘要

俗话说,一幅图胜过千言万语,那么这些文字——或者更确切地说,描述符——应该捕捉什么呢?如果我们要从图像的描述符中重建图像的本质,什么样的图像表示格式会足够丰富呢?在本文中,我们着手开发一个概念框架,该框架应:(i) 在生物学上合理,以便更好地从机制上理解我们的视觉系统;(ii) 足够稳健,能够在实际的现实图像中应用;(iii) 能够挖掘我们视觉世界的潜在结构。我们提出了三个关键思想。首先,我们认为基于表面的表示是基于视觉系统中间处理层中从输入进行的特征推断构建的。这种表示在很大程度上以预语义(在分类之前)和预注意的方式使用多种线索(方向、颜色、极性、方向变化等)进行计算,并明确保留特征之间的结构关系。构建的表面可能会部分重叠以补偿遮挡,并按深度排序(图底组织)。其次,我们提出这种中间表示可以通过对局部图像块中的特征之间的相似性进行分层计算以及对高度相似的单元进行池化来形成,并根据任务需求通过循环回路进行重新估计。最后,我们建议使用由逼真渲染的人造物体和表面组成的数据集,以便更好地理解模型的行为及其局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e9/4264474/40af779da129/fncom-08-00158-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验