Hoyer Lukas, Dai Dengxin, Van Gool Luc
IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):220-235. doi: 10.1109/TPAMI.2023.3320613. Epub 2023 Dec 5.
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks.
无监督域适应(UDA)和域泛化(DG)使在源域上训练的机器学习模型能够在未标记甚至未见的目标域上表现良好。由于先前的UDA&DG语义分割方法大多基于过时的网络,我们对更新的架构进行基准测试,揭示Transformer的潜力,并设计了专为UDA&DG量身定制的DAFormer网络。它通过三种训练策略来避免过度拟合源域:(1)稀有类采样减轻了对常见源域类的偏差,(2)事物类ImageNet特征距离和(3)学习率预热促进了来自ImageNet预训练的特征转移。由于UDA&DG通常对GPU内存要求较高,大多数先前的方法会缩小图像尺寸或裁剪图像。然而,低分辨率预测往往无法保留精细细节,而使用裁剪图像训练的模型在捕获远距离、域鲁棒的上下文信息方面存在不足。因此,我们提出了HRDA,一种用于UDA&DG的多分辨率框架,它结合了小的高分辨率裁剪以保留精细分割细节和大的低分辨率裁剪以通过学习的尺度注意力捕获远距离上下文依赖关系的优点。DAFormer和HRDA在5个不同基准上显著提高了当前最先进的UDA&DG超过10 mIoU。