Eldesokey Abdelrahman, Felsberg Michael, Khan Fahad Shahbaz
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2423-2436. doi: 10.1109/TPAMI.2019.2929170. Epub 2019 Jul 17.
Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g., data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5 percent of the number of parameters compared to the state-of-the-art methods.
一般来说,卷积神经网络(CNN)在规则网格上处理数据,例如普通相机生成的数据。为稀疏且间隔不规则的输入数据设计CNN仍然是一个开放的研究问题,在自动驾驶、机器人技术和监控等领域有众多应用。在本文中,我们为具有高度稀疏输入的CNN提出了一种代数约束归一化卷积层,与相关工作相比,该层具有更少的网络参数。我们提出了从卷积操作中确定置信度并将其传播到后续层的新颖策略。我们还提出了一个目标函数,该函数在最小化数据误差的同时最大化输出置信度。为了整合结构信息,我们还研究了在归一化卷积网络框架中融合深度和RGB信息的策略。此外,我们引入了使用输出置信度作为辅助信息来改进结果。我们的归一化卷积网络框架的能力在场景深度完成问题上得到了证明。在KITTI-Depth和NYU-Depth-v2数据集上进行了全面的实验。结果清楚地表明,与现有方法相比,所提出的方法在仅需要大约1%至5%的参数数量的情况下实现了卓越的性能。