视觉物体识别：我们（终于）比以前知道得更多了吗？

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

机构信息

Department of Psychology, Vanderbilt University, Nashville, Tennessee 37240-7817; email:

Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.

出版信息

Annu Rev Vis Sci. 2016 Oct 14;2:377-396. doi: 10.1146/annurev-vision-111815-114621. Epub 2016 Aug 3.

DOI:10.1146/annurev-vision-111815-114621

PMID:28532357

Abstract

How do we recognize objects despite changes in their appearance? The past three decades have been witness to intense debates regarding both whether objects are encoded invariantly with respect to viewing conditions and whether specialized, separable mechanisms are used for the recognition of different object categories. We argue that such dichotomous debates ask the wrong question. Much more important is the nature of object representations: What are features that enable invariance or differential processing between categories? Although the nature of object features is still an unanswered question, new methods for connecting data to models show significant potential for helping us to better understand neural codes for objects. Most prominently, new approaches to analyzing data from functional magnetic resonance imaging, including neural decoding and representational similarity analysis, and new computational models of vision, including convolutional neural networks, have enabled a much more nuanced understanding of visual representation. Convolutional neural networks are particularly intriguing as a tool for studying biological vision in that this class of artificial vision systems, based on biologically plausible deep neural networks, exhibits visual recognition capabilities that are approaching those of human observers. As these models improve in their recognition performance, it appears that they also become more effective in predicting and accounting for neural responses in the ventral cortex. Applying these and other deep models to empirical data shows great promise for enabling future progress in the study of visual recognition.

摘要

我们如何在物体外观发生变化的情况下识别物体？过去三十年见证了激烈的争论，涉及到物体是否针对观察条件进行不变编码，以及是否使用专门的、可分离的机制来识别不同的物体类别。我们认为，这种二分法的争论问错了问题。更重要的是物体表示的性质：是什么特征使类别之间的不变性或差异处理成为可能？尽管物体特征的性质仍然是一个未解决的问题，但将数据与模型联系起来的新方法显示出了帮助我们更好地理解物体神经编码的巨大潜力。最突出的是，用于分析功能磁共振成像数据的新方法，包括神经解码和表示相似性分析，以及新的视觉计算模型，包括卷积神经网络，使我们对视觉表示有了更细致入微的理解。卷积神经网络作为研究生物视觉的工具尤其引人注目，因为基于生物上合理的深度神经网络的这类人工视觉系统表现出的视觉识别能力正逐渐接近人类观察者的能力。随着这些模型在识别性能上的提高，它们似乎也在预测和解释腹侧皮层的神经反应方面变得更加有效。将这些和其他深度模型应用于经验数据，为未来视觉识别研究的进展带来了巨大的希望。

相似文献

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Annu Rev Vis Sci. 2016 Oct 14;2:377-396. doi: 10.1146/annurev-vision-111815-114621. Epub 2016 Aug 3.

Visual object recognition: do we know more now than we did 20 years ago?

Annu Rev Psychol. 2007;58:75-96. doi: 10.1146/annurev.psych.58.102904.190114.

The Ventral Visual Pathway Represents Animal Appearance over Animacy, Unlike Human Behavior and Deep Neural Networks.

J Neurosci. 2019 Aug 14;39(33):6513-6525. doi: 10.1523/JNEUROSCI.1714-18.2019. Epub 2019 Jun 13.

Non-accidental properties, metric invariance, and encoding by neurons in a model of ventral stream visual object recognition, VisNet.

Neurobiol Learn Mem. 2018 Jul;152:20-31. doi: 10.1016/j.nlm.2018.04.017. Epub 2018 May 1.

Separability of abstract-category and specific-exemplar visual object subsystems: evidence from fMRI pattern analysis.

Brain Cogn. 2015 Feb;93:54-63. doi: 10.1016/j.bandc.2014.11.007. Epub 2014 Dec 18.

Factorized visual representations in the primate visual system and deep neural networks.

Elife. 2024 Jul 5;13:RP91685. doi: 10.7554/eLife.91685.

Category-selective patterns of neural response in the ventral visual pathway in the absence of categorical information.

Neuroimage. 2016 Jul 15;135:107-14. doi: 10.1016/j.neuroimage.2016.04.060. Epub 2016 Apr 28.

Multivariate Patterns in the Human Object-Processing Pathway Reveal a Shift from Retinotopic to Shape Curvature Representations in Lateral Occipital Areas, LO-1 and LO-2.

J Neurosci. 2016 May 25;36(21):5763-74. doi: 10.1523/JNEUROSCI.3603-15.2016.

Mid-level visual features underlie the high-level categorical organization of the ventral stream.

Proc Natl Acad Sci U S A. 2018 Sep 18;115(38):E9015-E9024. doi: 10.1073/pnas.1719616115. Epub 2018 Aug 31.

Interaction between Scene and Object Processing Revealed by Human fMRI and MEG Decoding.

J Neurosci. 2017 Aug 9;37(32):7700-7710. doi: 10.1523/JNEUROSCI.0582-17.2017. Epub 2017 Jul 7.

引用本文的文献

Binocular cues to 3D face structure increase activation in depth-selective visual cortex with negligible effects in face-selective areas.

J Vis. 2025 Sep 2;25(11):6. doi: 10.1167/jov.25.11.6.

Gradual change of cortical representations with growing visual expertise for synthetic shapes.

Imaging Neurosci (Camb). 2024 Aug 6;2. doi: 10.1162/imag_a_00255. eCollection 2024.

Incidental learning of predictive temporal context within cortical representations of visual shape.

Imaging Neurosci (Camb). 2024 Aug 30;2. doi: 10.1162/imag_a_00278. eCollection 2024.

Making sense of transformer success.

Front Artif Intell. 2025 Apr 1;8:1509338. doi: 10.3389/frai.2025.1509338. eCollection 2025.

Enhanced and idiosyncratic neural representations of personally typical scenes.

Proc Biol Sci. 2025 Mar;292(2043):20250272. doi: 10.1098/rspb.2025.0272. Epub 2025 Mar 26.

A neural computational framework for face processing in the human temporal lobe.

Curr Biol. 2025 Apr 21;35(8):1765-1778.e6. doi: 10.1016/j.cub.2025.02.063. Epub 2025 Mar 20.

Neural correlates of minimal recognizable configurations in the human brain.

Cell Rep. 2025 Mar 25;44(3):115429. doi: 10.1016/j.celrep.2025.115429. Epub 2025 Mar 16.

Human infant EEG recordings for 200 object images presented in rapid visual streams.

Sci Data. 2025 Mar 8;12(1):407. doi: 10.1038/s41597-025-04744-z.

Greater neural pattern similarity to the native language is associated with better novel word learning.

Front Psychol. 2024 Dec 4;15:1456373. doi: 10.3389/fpsyg.2024.1456373. eCollection 2024.

The roles of symmetry and elongation in developing reference frames.

Front Psychol. 2024 Jul 1;15:1402156. doi: 10.3389/fpsyg.2024.1402156. eCollection 2024.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

视觉物体识别：我们（终于）比以前知道得更多了吗？

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

机构信息

Department of Psychology, Vanderbilt University, Nashville, Tennessee 37240-7817; email:

Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.

出版信息

Annu Rev Vis Sci. 2016 Oct 14;2:377-396. doi: 10.1146/annurev-vision-111815-114621. Epub 2016 Aug 3.

DOI:10.1146/annurev-vision-111815-114621

PMID:28532357

Abstract

摘要

视觉物体识别：我们（终于）比以前知道得更多了吗？

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

视觉物体识别：我们（终于）比以前知道得更多了吗？

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

机构信息

出版信息

相似文献

引用本文的文献