为什么现实世界中的视觉物体识别很难？

Why is real-world visual object recognition hard?

作者信息

Pinto Nicolas, Cox David D, DiCarlo James J

机构信息

McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2008 Jan;4(1):e27. doi: 10.1371/journal.pcbi.0040027.

DOI:10.1371/journal.pcbi.0040027

PMID:18225950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2211529/

Abstract

Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, "natural" images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled "natural" images in guiding that progress. In particular, we show that a simple V1-like model--a neuroscientist's "null" model, which should perform poorly at real-world visual object recognition tasks--outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a "simpler" recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition--real-world image variation.

摘要

要深入理解视觉背后的大脑机制，需要构建计算模型，这些模型不仅要模拟大脑的解剖结构和生理机能，而且最终要在视觉任务上达到与大脑相当的表现。近年来，“自然”图像在视觉研究中颇受青睐，并被用于展示在构建此类模型方面取得的显著进展。在此，我们对使用未经控制的“自然”图像来推动这一进展提出质疑。具体而言，我们发现一个简单的类似初级视觉皮层（V1）的模型——神经科学家的“零”模型，在现实世界的视觉物体识别任务中本应表现不佳——却在一个标准的、表面上是自然图像识别测试中超越了最先进的物体识别系统（包括受生物启发的和其他类型的）。作为对比，我们设计了一个“更简单”的识别测试，以更好地涵盖物体姿态、位置和比例在现实世界中的变化情况，并且我们证明这个测试能够正确揭示类似V1模型的不足之处。综合来看，这些结果表明基于未经控制的自然图像的测试可能会产生严重误导，有可能将研究进展引向错误的方向。相反，我们重新审视图像具有自然性意味着什么，并主张重新聚焦物体识别的核心问题——现实世界中的图像变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cabf/2217583/6e425709ae19/pcbi.0040027.g001.jpg

相似文献

Why is real-world visual object recognition hard?为什么现实世界中的视觉物体识别很难？

PLoS Comput Biol. 2008 Jan;4(1):e27. doi: 10.1371/journal.pcbi.0040027.

Robust object recognition with cortex-like mechanisms.具有类皮质机制的稳健目标识别

IEEE Trans Pattern Anal Mach Intell. 2007 Mar;29(3):411-26. doi: 10.1109/TPAMI.2007.56.

A high-throughput screening approach to discovering good forms of biologically inspired visual representation.一种高通量筛选方法，用于发现具有良好生物学启发的视觉表示形式。

PLoS Comput Biol. 2009 Nov;5(11):e1000579. doi: 10.1371/journal.pcbi.1000579. Epub 2009 Nov 26.

The effect of nonlinear human visual system components on performance of a channelized Hotelling observer in structured backgrounds.非线性人类视觉系统组件对结构化背景下通道化霍特林观察者性能的影响。

IEEE Trans Med Imaging. 2006 Oct;25(10):1348-62. doi: 10.1109/tmi.2006.880681.

Learning viewpoint invariant perceptual representations from cluttered images.从杂乱图像中学习视角不变的感知表征。

IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):753-61. doi: 10.1109/TPAMI.2005.105.

Line width determination using a biomimetic fly eye vision system.使用仿生复眼视觉系统进行线宽测定。

Biomed Sci Instrum. 2007;43:224-9.

A visual-attention model using Earth Mover's Distance-based saliency measurement and nonlinear feature combination.基于 Earth Mover's Distance 的显著度测量和非线性特征组合的视觉注意模型。

IEEE Trans Pattern Anal Mach Intell. 2013 Feb;35(2):314-28. doi: 10.1109/TPAMI.2012.119.

Handwritten-word spotting using biologically inspired features.利用生物启发特征的手写文字识别

IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1945-57. doi: 10.1109/TPAMI.2008.144.

Object-of-interest image segmentation based on human attention and semantic region clustering.基于人类注意力和语义区域聚类的感兴趣对象图像分割

J Opt Soc Am A Opt Image Sci Vis. 2006 Oct;23(10):2462-70. doi: 10.1364/josaa.23.002462.

GAFFE: a gaze-attentive fixation finding engine.GAFFE：一种注视注意力固定发现引擎。

IEEE Trans Image Process. 2008 Apr;17(4):564-73. doi: 10.1109/TIP.2008.917218.

引用本文的文献

Target identification under high levels of amplitude, size, orientation and background uncertainty.在高幅度、尺寸、方向和背景不确定性水平下的目标识别。

J Vis. 2025 Feb 3;25(2):3. doi: 10.1167/jov.25.2.3.

Few-shot learning for inference in medical imaging with subspace feature representations.基于子空间特征表示的医学影像推断中的少样本学习。

PLoS One. 2024 Nov 6;19(11):e0309368. doi: 10.1371/journal.pone.0309368. eCollection 2024.

Object recognition in primates: What can early visual areas contribute?灵长类动物的物体识别：早期视觉区域能起到什么作用？

ArXiv. 2024 Jul 5:arXiv:2407.04816v1.

Object recognition in primates: what can early visual areas contribute?灵长类动物的物体识别：早期视觉区域能起到什么作用？

Front Behav Neurosci. 2024 Jul 12;18:1425496. doi: 10.3389/fnbeh.2024.1425496. eCollection 2024.

How does V1 population activity inform perceptual certainty?V1 群体活动如何传递知觉确定性？

J Vis. 2024 Jun 3;24(6):12. doi: 10.1167/jov.24.6.12.

Electrophysiological analysis of signal detection outcomes emphasizes the role of decisional factors in recognition memory.信号检测结果的电生理分析强调了决策因素在识别记忆中的作用。

Front Hum Neurosci. 2024 Mar 20;18:1358298. doi: 10.3389/fnhum.2024.1358298. eCollection 2024.

Multiple mechanisms of visual prediction as revealed by the timecourse of scene-object facilitation.场景-物体促进的时程揭示了视觉预测的多种机制。

Psychophysiology. 2024 May;61(5):e14503. doi: 10.1111/psyp.14503. Epub 2024 Jan 5.

A computationally informed comparison between the strategies of rodents and humans in visual object recognition.基于计算信息的啮齿动物和人类在视觉物体识别策略上的比较。

Elife. 2023 Dec 11;12:RP87719. doi: 10.7554/eLife.87719.

Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney.兆字节规模的小鼠肾脏监督 3D 训练和基准数据集。

Sci Data. 2023 Aug 3;10(1):510. doi: 10.1038/s41597-023-02407-5.

The developmental trajectory of object recognition robustness: Children are like small adults but unlike big deep neural networks.客体识别稳健性的发展轨迹：儿童似小大人而非大深度神经网络。

J Vis. 2023 Jul 3;23(7):4. doi: 10.1167/jov.23.7.4.

本文引用的文献

Untangling invariant object recognition.解开不变物体识别之谜。

Trends Cogn Sci. 2007 Aug;11(8):333-41. doi: 10.1016/j.tics.2007.06.010. Epub 2007 Jul 16.

Unsupervised learning of visual features through spike timing dependent plasticity.通过依赖于尖峰时间的可塑性进行视觉特征的无监督学习。

PLoS Comput Biol. 2007 Feb 16;3(2):e31. doi: 10.1371/journal.pcbi.0030031. Epub 2007 Jan 2.

The duration of the attentional blink in natural scenes depends on stimulus category.自然场景中注意瞬脱的持续时间取决于刺激类别。

Vision Res. 2007 Mar;47(5):597-607. doi: 10.1016/j.visres.2006.12.007. Epub 2007 Feb 1.

Robust object recognition with cortex-like mechanisms.具有类皮质机制的稳健目标识别

IEEE Trans Pattern Anal Mach Intell. 2007 Mar;29(3):411-26. doi: 10.1109/TPAMI.2007.56.

In praise of artifice.赞技艺

Nat Neurosci. 2005 Dec;8(12):1647-50. doi: 10.1038/nn1606.

A natural approach to studying vision.一种研究视觉的自然方法。

Nat Neurosci. 2005 Dec;8(12):1643-6. doi: 10.1038/nn1608.

Fast readout of object identity from macaque inferior temporal cortex.从猕猴颞下皮质快速读出物体身份

Science. 2005 Nov 4;310(5749):863-6. doi: 10.1126/science.1117593.

Receptive fields, binocular interaction and functional architecture in the cat's visual cortex.猫视觉皮层中的感受野、双眼相互作用及功能结构

J Physiol. 1962 Jan;160(1):106-54. doi: 10.1113/jphysiol.1962.sp006837.

Natural image statistics and neural representation.自然图像统计与神经表征。

Annu Rev Neurosci. 2001;24:1193-216. doi: 10.1146/annurev.neuro.24.1.1193.

How do visual neurons respond in the real world?视觉神经元在现实世界中是如何做出反应的？

Curr Opin Neurobiol. 2001 Aug;11(4):437-42. doi: 10.1016/s0959-4388(00)00231-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

为什么现实世界中的视觉物体识别很难？

Why is real-world visual object recognition hard?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献