FCNet：基于特征关联网络的立体三维目标检测

FCNet: Stereo 3D Object Detection with Feature Correlation Networks.

作者信息

Wu Yingyu, Liu Ziyan, Chen Yunlei, Zheng Xuhui, Zhang Qian, Yang Mo, Tang Guangming

机构信息

College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China.

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.

出版信息

Entropy (Basel). 2022 Aug 14;24(8):1121. doi: 10.3390/e24081121.

DOI:10.3390/e24081121

PMID:36010784

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9407267/

Abstract

Deep-learning techniques have significantly improved object detection performance, especially with binocular images in 3D scenarios. To supervise the depth information in stereo 3D object detection, reconstructing the 3D dense depth of LiDAR point clouds causes higher computational costs and lower inference speed. After exploring the intrinsic relationship between the implicit depth information and semantic texture features of the binocular images, we propose an efficient and accurate 3D object detection algorithm, FCNet, in stereo images. First, we construct a multi-scale cost-volume containing implicit depth information using the normalized dot-product by generating multi-scale feature maps from the input stereo images. Secondly, the variant attention model enhances its global and local description, and the sparse region monitors the depth loss deep regression. Thirdly, for balancing the channel information preservation of the re-fused left-right feature maps and computational burden, a reweighting strategy is employed to enhance the feature correlation in merging the last-layer features of binocular images. Extensive experiment results on the challenging KITTI benchmark demonstrate that the proposed algorithm achieves better performance, including a lower computational cost and higher inference speed in 3D object detection.

摘要

深度学习技术显著提高了目标检测性能，尤其是在3D场景中的双目图像方面。为了在立体3D目标检测中监督深度信息，重建激光雷达点云的3D密集深度会导致更高的计算成本和更低的推理速度。在探索了双目图像的隐式深度信息与语义纹理特征之间的内在关系之后，我们提出了一种高效且准确的立体图像3D目标检测算法FCNet。首先，通过从输入的立体图像生成多尺度特征图，利用归一化点积构建包含隐式深度信息的多尺度代价体。其次，可变注意力模型增强其全局和局部描述，稀疏区域监测深度损失深度回归。第三，为了平衡重新融合的左右特征图的通道信息保留和计算负担，采用了一种重新加权策略来增强在合并双目图像最后一层特征时的特征相关性。在具有挑战性的KITTI基准上的大量实验结果表明，所提出的算法在3D目标检测中实现了更好的性能，包括更低的计算成本和更高的推理速度。