Yang Jihan, Shi Shaoshuai, Wang Zhe, Li Hongsheng, Qi Xiaojuan
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6354-6371. doi: 10.1109/TPAMI.2022.3216606. Epub 2023 Apr 3.
In this paper, we present a self-training method, named ST3D++, with a holistic pseudo label denoising pipeline for unsupervised domain adaptation on 3D object detection. ST3D++ aims at reducing noise in pseudo label generation as well as alleviating the negative impacts of noisy pseudo labels on model training. First, ST3D++ pre-trains the 3D object detector on the labeled source domain with random object scaling (ROS) which is designed to reduce target domain pseudo label noise arising from object scale bias of the source domain. Then, the detector is progressively improved through alternating between generating pseudo labels and training the object detector with pseudo-labeled target domain data. Here, we equip the pseudo label generation process with a hybrid quality-aware triplet memory to improve the quality and stability of generated pseudo labels. Meanwhile, in the model training stage, we propose a source data assisted training strategy and a curriculum data augmentation policy to effectively rectify noisy gradient directions and avoid model over-fitting to noisy pseudo labeled data. These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations. Finally, our method is assessed on four 3D benchmark datasets (i.e., Waymo, KITTI, Lyft, and nuScenes) for three common categories (i.e., car, pedestrian and bicycle). ST3D++ achieves state-of-the-art performance on all evaluated settings, outperforming the corresponding baseline by a large margin (e.g., 9.6% ∼ 38.16% on Waymo → KITTI in terms of AP[Formula: see text]), and even surpasses the fully supervised oracle results on the KITTI 3D object detection benchmark with target prior. Code is available at https://github.com/CVMI-Lab/ST3D.
在本文中,我们提出了一种名为ST3D++的自训练方法,它带有一个整体的伪标签去噪管道,用于三维目标检测中的无监督域自适应。ST3D++旨在减少伪标签生成中的噪声,并减轻有噪声的伪标签对模型训练的负面影响。首先,ST3D++在有标签的源域上使用随机对象缩放(ROS)对三维目标检测器进行预训练,ROS旨在减少由于源域的对象尺度偏差而产生的目标域伪标签噪声。然后,通过在生成伪标签和使用伪标签化的目标域数据训练目标检测器之间交替,逐步改进检测器。在这里,我们为伪标签生成过程配备了一个混合质量感知三元组存储器,以提高生成的伪标签的质量和稳定性。同时,在模型训练阶段,我们提出了一种源数据辅助训练策略和一种课程数据增强策略,以有效纠正有噪声的梯度方向,并避免模型过度拟合有噪声的伪标签数据。这些具体设计使检测器能够在经过精心细化的带有去噪训练信号的伪标签化目标数据上进行训练,从而有效地促进目标检测器在无需注释的情况下适应目标域。最后,我们的方法在四个三维基准数据集(即Waymo、KITTI、Lyft和nuScenes)上针对三个常见类别(即汽车、行人与自行车)进行了评估。ST3D++在所有评估设置下均取得了领先的性能,大幅超越了相应的基线(例如,在Waymo→KITTI上,平均精度[公式:见正文]方面提高了9.6%至38.16%),甚至在带有目标先验的KITTI三维目标检测基准上超过了完全监督的最优结果。代码可在https://github.com/CVMI-Lab/ST3D获取。