Suppr超能文献

脑启发式自动化视觉目标发现与检测。

Brain-inspired automated visual object discovery and detection.

机构信息

Department of Electrical and Computer Engineering, University of California, Los Angeles, CA 90095.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305

出版信息

Proc Natl Acad Sci U S A. 2019 Jan 2;116(1):96-105. doi: 10.1073/pnas.1802103115. Epub 2018 Dec 17.

Abstract

Despite significant recent progress, machine vision systems lag considerably behind their biological counterparts in performance, scalability, and robustness. A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under various nonideal circumstances, such as partial occlusion and different view angles. Replication of such capabilities in a machine would require three key ingredients: () access to large-scale perceptual data of the kind that humans experience, () flexible representations of objects, and () an efficient unsupervised learning algorithm. The Internet fortunately provides unprecedented access to vast amounts of visual data. This paper leverages the availability of such data to develop a scalable framework for unsupervised learning of object prototypes-brain-inspired flexible, scale, and shift invariant representations of deformable objects (e.g., humans, motorcycles, cars, airplanes) comprised of parts, their different configurations and views, and their spatial relationships. Computationally, the object prototypes are represented as geometric associative networks using probabilistic constructs such as Markov random fields. We apply our framework to various datasets and show that our approach is computationally scalable and can construct accurate and operational part-aware object models much more efficiently than in much of the recent computer vision literature. We also present efficient algorithms for detection and localization in new scenes of objects and their partial views.

摘要

尽管最近取得了重大进展,但机器视觉系统在性能、可扩展性和鲁棒性方面仍远远落后于生物系统。大脑的一个显著特点是它能够自动发现和建模对象,以多尺度分辨率,从重复暴露于未标记的上下文数据中,然后能够在各种不理想的情况下稳健地检测到学习到的对象,例如部分遮挡和不同的视角。在机器中复制这种能力需要三个关键要素:(1)能够访问人类所经历的那种大规模感知数据;(2)灵活的对象表示;(3)高效的无监督学习算法。互联网幸运地为我们提供了对大量视觉数据的前所未有的访问。本文利用这些数据的可用性,开发了一个用于无监督学习对象原型的可扩展框架,这些原型是受大脑启发的灵活的、尺度和位移不变的可变形对象(例如人、摩托车、汽车、飞机)的表示,包括它们的不同配置和视图,以及它们的空间关系。在计算上,对象原型使用概率结构(如马尔可夫随机场)表示为几何关联网络。我们将我们的框架应用于各种数据集,并表明我们的方法在计算上是可扩展的,并且可以比最近的许多计算机视觉文献更有效地构建准确和可操作的部分感知对象模型。我们还提出了用于新场景中对象及其部分视图的检测和定位的高效算法。

相似文献

1
Brain-inspired automated visual object discovery and detection.
Proc Natl Acad Sci U S A. 2019 Jan 2;116(1):96-105. doi: 10.1073/pnas.1802103115. Epub 2018 Dec 17.
2
Non-accidental properties, metric invariance, and encoding by neurons in a model of ventral stream visual object recognition, VisNet.
Neurobiol Learn Mem. 2018 Jul;152:20-31. doi: 10.1016/j.nlm.2018.04.017. Epub 2018 May 1.
4
Learning the compositional nature of visual object categories for recognition.
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):501-16. doi: 10.1109/TPAMI.2009.22.
5
Object recognition in medical images via anatomy-guided deep learning.
Med Image Anal. 2022 Oct;81:102527. doi: 10.1016/j.media.2022.102527. Epub 2022 Jun 25.
6
3-D object recognition using 2-D views.
IEEE Trans Image Process. 2008 Nov;17(11):2236-55. doi: 10.1109/TIP.2008.2003404.
8
Learning viewpoint invariant perceptual representations from cluttered images.
IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):753-61. doi: 10.1109/TPAMI.2005.105.
10
Learning to detect objects in images via a sparse, part-based representation.
IEEE Trans Pattern Anal Mach Intell. 2004 Nov;26(11):1475-90. doi: 10.1109/TPAMI.2004.108.

引用本文的文献

1
Reading specific memories from human neurons before and after sleep.
bioRxiv. 2025 Jul 4:2025.07.01.662486. doi: 10.1101/2025.07.01.662486.
2
A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection.
Biomimetics (Basel). 2025 May 19;10(5):332. doi: 10.3390/biomimetics10050332.

本文引用的文献

1
Atoms of recognition in human and computer vision.
Proc Natl Acad Sci U S A. 2016 Mar 8;113(10):2744-9. doi: 10.1073/pnas.1513198113. Epub 2016 Feb 16.
2
Visual Turing test for computer vision systems.
Proc Natl Acad Sci U S A. 2015 Mar 24;112(12):3618-23. doi: 10.1073/pnas.1422953112. Epub 2015 Mar 9.
3
Articulated human detection with flexible mixtures of parts.
IEEE Trans Pattern Anal Mach Intell. 2013 Dec;35(12):2878-90. doi: 10.1109/TPAMI.2012.261.
4
Reverse engineering the cognitive brain.
Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15512-3. doi: 10.1073/pnas.1313114110. Epub 2013 Sep 12.
5
Representation learning: a review and new perspectives.
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.
6
From simple innate biases to complex visual concepts.
Proc Natl Acad Sci U S A. 2012 Oct 30;109(44):18215-20. doi: 10.1073/pnas.1207690109. Epub 2012 Sep 24.
7
Object detection with discriminatively trained part-based models.
IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1627-45. doi: 10.1109/TPAMI.2009.167.
8
Learning multiple layers of representation.
Trends Cogn Sci. 2007 Oct;11(10):428-34. doi: 10.1016/j.tics.2007.09.004.
9
Visual object recognition.
Annu Rev Neurosci. 1996;19:577-621. doi: 10.1146/annurev.ne.19.030196.003045.
10
Stimulus-selective properties of inferior temporal neurons in the macaque.
J Neurosci. 1984 Aug;4(8):2051-62. doi: 10.1523/JNEUROSCI.04-08-02051.1984.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验