Department of Information Engineering and Mathematics, University of Siena, Italy.
Neural Netw. 2020 Jun;126:275-299. doi: 10.1016/j.neunet.2020.03.013. Epub 2020 Mar 20.
Humans are continuously exposed to a stream of visual data with a natural temporal structure. However, most successful computer vision algorithms work at image level, completely discarding the precious information carried by motion. In this paper, we claim that processing visual streams naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning that originates from variational principles, just like in physics. Such principled approach is well suited for a discussion on a number of interesting questions that arise in vision, and it offers a well-posed computational scheme for the discovery of convolutional filters over the retina. Differently from traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario for the unsupervised processing of video signals, where features are extracted in a multi-layer architecture with motion invariance. While the theory enables the implementation of novel computer vision systems, it also sheds light on the role of information-based principles to drive possible biological solutions.
人类不断地暴露在具有自然时间结构的视觉数据流中。然而,大多数成功的计算机视觉算法都是在图像层面上工作的,完全忽略了运动所携带的宝贵信息。在本文中,我们声称处理视觉流自然会导致形式化运动不变性原理,这使得可以构建一种新的源自变分原理的学习理论,就像物理学中的那样。这种基于原则的方法非常适合讨论在视觉中出现的许多有趣的问题,并且它为在视网膜上发现卷积滤波器提供了一个良好的计算方案。与需要大量监督的传统卷积网络不同,所提出的理论为视频信号的无监督处理提供了一个全新的场景,其中特征在具有运动不变性的多层架构中提取。虽然该理论使能够实现新型计算机视觉系统,但它也揭示了基于信息的原理在驱动可能的生物解决方案方面的作用。