The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece.
Universidad Politécnica de Madrid, 28040 Madrid, Spain.
Sensors (Basel). 2020 Jul 10;20(14):3855. doi: 10.3390/s20143855.
In this paper, two novel and practical regularizing methods are proposed to improve existing neural network architectures for monocular optical flow estimation. The proposed methods aim to alleviate deficiencies of current methods, such as flow leakage across objects and motion consistency within rigid objects, by exploiting contextual information. More specifically, the first regularization method utilizes semantic information during the training process to explicitly regularize the produced optical flow field. The novelty of this method lies in the use of semantic segmentation masks to teach the network to implicitly identify the semantic edges of an object and better reason on the local motion flow. A novel loss function is introduced that takes into account the objects' boundaries as derived from the semantic segmentation mask to selectively penalize motion inconsistency within an object. The method is architecture agnostic and can be integrated into any neural network without modifying or adding complexity at inference. The second regularization method adds spatial awareness to the input data of the network in order to improve training stability and efficiency. The coordinates of each pixel are used as an additional feature, breaking the invariance properties of the neural network architecture. The additional features are shown to implicitly regularize the optical flow estimation enforcing a consistent flow, while improving both the performance and the convergence time. Finally, the combination of both regularization methods further improves the performance of existing cutting edge architectures in a complementary way, both quantitatively and qualitatively, on popular flow estimation benchmark datasets.
在本文中,我们提出了两种新颖而实用的正则化方法,以改进现有的用于单目光流估计的神经网络架构。所提出的方法旨在通过利用上下文信息来缓解当前方法的缺陷,例如物体之间的流泄漏和刚体内部的运动一致性。更具体地说,第一种正则化方法在训练过程中利用语义信息来显式地正则化产生的光流场。该方法的新颖之处在于使用语义分割掩模来教导网络隐式识别物体的语义边缘,并更好地推理局部运动流。引入了一种新的损失函数,该函数考虑了来自语义分割掩模的物体边界,以选择性地惩罚物体内部的运动不一致性。该方法与架构无关,可以集成到任何神经网络中,而无需在推理时修改或增加复杂性。第二种正则化方法为网络的输入数据添加空间感知,以提高训练稳定性和效率。每个像素的坐标被用作附加特征,打破了神经网络架构的不变性属性。附加特征被证明可以隐式正则化光流估计,强制实现一致的流,同时提高性能和收敛时间。最后,两种正则化方法的结合以互补的方式进一步提高了现有前沿架构的性能,无论是在定量还是定性方面,在流行的光流估计基准数据集上都得到了验证。