IEEE Trans Image Process. 2022;31:3525-3540. doi: 10.1109/TIP.2022.3172208. Epub 2022 May 18.
Understanding foggy image sequence in driving scene is critical for autonomous driving, but it remains a challenging task due to the difficulty in collecting and annotating real-world images of adverse weather. Recently, self-training strategy has been considered as a powerful solution for unsupervised domain adaptation, which iteratively adapts the model from the source domain to the target domain by generating target pseudo labels and re-training the model. However, the selection of confident pseudo labels inevitably suffers from the conflict between sparsity and accuracy, both of which will lead to suboptimal models. To tackle this problem, we exploit the characteristics of the foggy image sequence of driving scenes to densify the confident pseudo labels. Specifically, based on the two discoveries of local spatial similarity and adjacent temporal correspondence of the sequential image data, we propose a novel Target-Domain driven pseudo label Diffusion (TDo-Dif) scheme. It employs superpixels and optical flows to identify the spatial similarity and temporal correspondence, respectively, and then diffuses the confident but sparse pseudo labels within a superpixel or a temporal corresponding pair linked by the flow. Moreover, to ensure the feature similarity of the diffused pixels, we introduce local spatial similarity loss and temporal contrastive loss in the model re-training stage. Experimental results show that our TDo-Dif scheme helps the adaptive model achieve 51.92% and 53.84% mean intersection-over-union (mIoU) on two publicly available natural foggy datasets (Foggy Zurich and Foggy Driving), which exceeds the state-of-the-art unsupervised domain adaptive semantic segmentation methods. The proposed method can also be applied to non-sequential images in the target domain by considering only spatial similarity.
理解驾驶场景中的雾天图像序列对于自动驾驶至关重要,但由于难以收集和注释恶劣天气的真实世界图像,这仍然是一项具有挑战性的任务。最近,自训练策略已被视为一种强大的无监督领域自适应解决方案,通过生成目标伪标签和重新训练模型,从源域迭代地适应目标域。然而,置信伪标签的选择不可避免地受到稀疏性和准确性之间的冲突的影响,这两者都会导致次优的模型。为了解决这个问题,我们利用驾驶场景中雾天图像序列的特征来密集化置信伪标签。具体来说,基于序列图像数据的局部空间相似性和相邻时间对应关系的两个发现,我们提出了一种新的目标域驱动的伪标签扩散(TDo-Dif)方案。它采用超像素和光流分别识别空间相似性和时间对应关系,然后在超像素或由流连接的时间对应对中扩散置信但稀疏的伪标签。此外,为了确保扩散像素的特征相似性,我们在模型重新训练阶段引入了局部空间相似性损失和时间对比损失。实验结果表明,我们的 TDo-Dif 方案有助于自适应模型在两个公开可用的自然雾天数据集(Foggy Zurich 和 Foggy Driving)上分别实现 51.92%和 53.84%的平均交并比(mIoU),超过了最先进的无监督领域自适应语义分割方法。通过仅考虑空间相似性,我们的方法也可以应用于目标域中的非序列图像。