College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China.
Sensors (Basel). 2022 Aug 23;22(17):6344. doi: 10.3390/s22176344.
This paper reports a study that aims to solve the problem of the weak adaptability to angle transformation of current monocular depth estimation algorithms. These algorithms are based on convolutional neural networks (CNNs) but produce results lacking in estimation accuracy and robustness. The paper proposes a lightweight network based on convolution and capsule feature fusion (CNNapsule). First, the paper introduces a fusion block module that integrates CNN features and matrix capsule features to improve the adaptability of the network to perspective transformations. The fusion and deconvolution features are fused through skip connections to generate a depth image. In addition, the corresponding loss function is designed according to the long-tail distribution, gradient similarity, and structural similarity of the datasets. Finally, the results are compared with the methods applied to the NYU Depth V2 and KITTI datasets and show that our proposed method has better accuracy on the C1 and C2 indices and a better visual effect than traditional methods and deep learning methods without transfer learning. The number of trainable parameters required by this method is 65% lower than that required by methods presented in the literature. The generalization of this method is verified via the comparative testing of the data collected from the internet and mobile phones.
本文报道了一项研究,旨在解决当前单目深度估计算法对角度变换的弱适应性问题。这些算法基于卷积神经网络(CNN),但产生的结果在估计精度和鲁棒性方面存在不足。本文提出了一种基于卷积和胶囊特征融合的轻量级网络(CNNapsule)。首先,本文引入了一个融合块模块,该模块集成了 CNN 特征和矩阵胶囊特征,以提高网络对透视变换的适应性。融合和反卷积特征通过跳过连接进行融合,以生成深度图像。此外,根据数据集的长尾分布、梯度相似性和结构相似性设计了相应的损失函数。最后,将结果与应用于 NYU Depth V2 和 KITTI 数据集的方法进行比较,结果表明,与传统方法和不进行迁移学习的深度学习方法相比,我们提出的方法在 C1 和 C2 指标上具有更好的准确性和更好的视觉效果。该方法所需的可训练参数数量比文献中提出的方法低 65%。通过对从互联网和手机收集的数据进行比较测试,验证了该方法的泛化能力。