Lukanov Hristofor, König Peter, Pipa Gordon
Department of Neuroinformatics, Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany.
Department of Neurobiopsychology, Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany.
Front Comput Neurosci. 2021 Nov 22;15:746204. doi: 10.3389/fncom.2021.746204. eCollection 2021.
While abundant in biology, foveated vision is nearly absent from computational models and especially deep learning architectures. Despite considerable hardware improvements, training deep neural networks still presents a challenge and constraints complexity of models. Here we propose an end-to-end neural model for foveal-peripheral vision, inspired by retino-cortical mapping in primates and humans. Our model has an efficient sampling technique for compressing the visual signal such that a small portion of the scene is perceived in high resolution while a large field of view is maintained in low resolution. An attention mechanism for performing "eye-movements" assists the agent in collecting detailed information incrementally from the observed scene. Our model achieves comparable results to a similar neural architecture trained on full-resolution data for image classification and outperforms it at video classification tasks. At the same time, because of the smaller size of its input, it can reduce computational effort tenfold and uses several times less memory. Moreover, we present an easy to implement bottom-up and top-down attention mechanism which relies on task-relevant features and is therefore a convenient byproduct of the main architecture. Apart from its computational efficiency, the presented work provides means for exploring active vision for agent training in simulated environments and anthropomorphic robotics.
虽然在生物学中很常见,但在计算模型尤其是深度学习架构中,中央凹视觉却几乎不存在。尽管硬件有了很大改进,但训练深度神经网络仍然是一项挑战,并且限制了模型的复杂性。在此,我们受灵长类动物和人类视网膜-皮质映射的启发,提出了一种用于中央凹-周边视觉的端到端神经模型。我们的模型有一种高效的采样技术来压缩视觉信号,使得在保持大视野低分辨率的同时,能以高分辨率感知场景的一小部分。一种用于执行“眼动”的注意力机制帮助智能体从观察到的场景中逐步收集详细信息。我们的模型在图像分类任务上取得了与在全分辨率数据上训练的类似神经架构相当的结果,并且在视频分类任务上优于该架构。同时,由于其输入尺寸较小,它可以将计算量减少到十分之一,并且使用的内存也减少几倍。此外,我们提出了一种易于实现的自下而上和自上而下的注意力机制,该机制依赖于与任务相关的特征,因此是主要架构的一个便利副产品。除了计算效率之外,本文所展示的工作还为在模拟环境和拟人机器人中探索用于智能体训练的主动视觉提供了方法。