通过具有中央凹视觉系统的搜索进行目标检测。

Humans and many other species sense visual information with varying spatial resolution across the visual field (foveated vision) and deploy eye movements to actively sample regions of interests in scenes. The advantage of such varying resolution architecture is a reduced computational, hence metabolic cost. But what are the performance costs of such processing strategy relative to a scheme that processes the visual field at high spatial resolution? Here we first focus on visual search and combine object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We develop a foveated object detector that processes the entire scene with varying resolution, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. We compared the foveated object detector against a non-foveated version of the same object detector which processes the entire image at homogeneous high spatial resolution. We evaluated the accuracy of the foveated and non-foveated object detectors identifying 20 different objects classes in scenes from a standard computer vision data set (the PASCAL VOC 2007 dataset). We show that the foveated object detector can approximate the performance of the object detector with homogeneous high spatial resolution processing while bringing significant computational cost savings. Additionally, we assessed the impact of foveation on the computation of bottom-up saliency. An implementation of a simple foveated bottom-up saliency model with eye movements showed agreement in the selection of top salient regions of scenes with those selected by a non-foveated high resolution saliency model. Together, our results might help explain the evolution of foveated visual systems with eye movements as a solution that preserves perceptual performance in visual search while resulting in computational and metabolic savings to the brain.

人类和许多其他物种在整个视野范围内以不同的空间分辨率感知视觉信息（中央凹视觉），并通过眼球运动主动对场景中的感兴趣区域进行采样。这种分辨率变化的架构的优势在于计算量减少，从而降低了代谢成本。但是，相对于以高空间分辨率处理视野的方案，这种处理策略的性能成本是什么呢？在这里，我们首先关注视觉搜索，并将计算机视觉中的目标检测器与人类视觉系统V1层中发现的外周池化区域的最新模型相结合。我们开发了一种中央凹目标检测器，它以不同的分辨率处理整个场景，使用视网膜特异性目标检测分类器来指导眼球运动，将其中央凹与输入图像中的感兴趣区域对齐，并整合多个注视点的观察结果。我们将中央凹目标检测器与同一目标检测器的非中央凹版本进行了比较，后者以均匀的高空间分辨率处理整个图像。我们评估了中央凹和非中央凹目标检测器在从标准计算机视觉数据集（PASCAL VOC 2007数据集）的场景中识别20种不同物体类别的准确性。我们表明，中央凹目标检测器在带来显著计算成本节省的同时，可以近似于具有均匀高空间分辨率处理的目标检测器的性能。此外，我们评估了中央凹对自下而上显著性计算的影响。一个带有眼球运动的简单中央凹自下而上显著性模型的实现表明，在场景的顶级显著区域的选择上，与非中央凹高分辨率显著性模型选择的区域一致。总之，我们的结果可能有助于解释具有眼球运动的中央凹视觉系统的进化，作为一种在视觉搜索中保持感知性能同时为大脑节省计算和代谢成本的解决方案。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Object detection through search with a foveated visual system.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具