Park Gyuhee, Koh Junho, Kim Jisong, Moon Jun, Choi Jun Won
Department of Electrical Engineering, Hanyang University, Seoul 04763, Republic of Korea.
Department of Electrical and Computer Engineering, College of Liberal Studies, Seoul National University, Seoul 08826, Republic of Korea.
Sensors (Basel). 2024 Jul 18;24(14):4667. doi: 10.3390/s24144667.
Recently, the growing demand for autonomous driving in the industry has led to a lot of interest in 3D object detection, resulting in many excellent 3D object detection algorithms. However, most 3D object detectors focus only on a single set of LiDAR points, ignoring their potential ability to improve performance by leveraging the information provided by the consecutive set of LIDAR points. In this paper, we propose a novel 3D object detection method called temporal motion-aware 3D object detection (TM3DOD), which utilizes temporal LiDAR data. In the proposed TM3DOD method, we aggregate LiDAR voxels over time and the current BEV features by generating motion features using consecutive BEV feature maps. First, we present the temporal voxel encoder (TVE), which generates voxel representations by capturing the temporal relationships among the point sets within a voxel. Next, we design a motion-aware feature aggregation network (MFANet), which aims to enhance the current BEV feature representation by quantifying the temporal variation between two consecutive BEV feature maps. By analyzing the differences and changes in the BEV feature maps over time, MFANet captures motion information and integrates it into the current feature representation, enabling more robust and accurate detection of 3D objects. Experimental evaluations on the nuScenes benchmark dataset demonstrate that the proposed TM3DOD method achieved significant improvements in 3D detection performance compared with the baseline methods. Additionally, our method achieved comparable performance to state-of-the-art approaches.
近年来,行业内对自动驾驶的需求不断增长,引发了人们对三维目标检测的浓厚兴趣,催生了许多优秀的三维目标检测算法。然而,大多数三维目标检测器仅关注单组激光雷达点,忽略了利用连续激光雷达点集所提供信息来提升性能的潜在能力。在本文中,我们提出了一种名为时态运动感知三维目标检测(TM3DOD)的新型三维目标检测方法,该方法利用了时态激光雷达数据。在所提出的TM3DOD方法中,我们通过使用连续的鸟瞰图(BEV)特征图生成运动特征,对不同时刻的激光雷达体素和当前的BEV特征进行聚合。首先,我们提出了时态体素编码器(TVE),它通过捕捉体素内点集之间的时态关系来生成体素表示。接下来,我们设计了一个运动感知特征聚合网络(MFANet),其目的是通过量化两个连续BEV特征图之间的时态变化来增强当前的BEV特征表示。通过分析BEV特征图随时间的差异和变化,MFANet捕捉运动信息并将其整合到当前特征表示中,从而实现对三维目标更稳健、准确的检测。在nuScenes基准数据集上的实验评估表明,与基线方法相比,所提出的TM3DOD方法在三维检测性能上取得了显著提升。此外,我们的方法与当前最先进的方法性能相当。