Suppr超能文献

注意力和增强在域适应与泛化中假设转移的重要性

On the Importance of Attention and Augmentations for Hypothesis Transfer in Domain Adaptation and Generalization.

作者信息

Sahay Rajat, Thomas Georgi, Jahan Chowdhury Sadman, Manjrekar Mihir, Popp Dan, Savakis Andreas

机构信息

Rochester Institute of Technology, Rochester, NY 14623, USA.

出版信息

Sensors (Basel). 2023 Oct 12;23(20):8409. doi: 10.3390/s23208409.

Abstract

Unsupervised domain adaptation (UDA) aims to mitigate the performance drop due to the distribution shift between the training and testing datasets. UDA methods have achieved performance gains for models trained on a source domain with labeled data to a target domain with only unlabeled data. The standard feature extraction method in domain adaptation has been convolutional neural networks (CNNs). Recently, attention-based transformer models have emerged as effective alternatives for computer vision tasks. In this paper, we benchmark three attention-based architectures, specifically vision transformer (ViT), shifted window transformer (SWIN), and dual attention vision transformer (DAViT), against convolutional architectures ResNet, HRNet and attention-based ConvNext, to assess the performance of different backbones for domain generalization and adaptation. We incorporate these backbone architectures as feature extractors in the source hypothesis transfer (SHOT) framework for UDA. SHOT leverages the knowledge learned in the source domain to align the image features of unlabeled target data in the absence of source domain data, using self-supervised deep feature clustering and self-training. We analyze the generalization and adaptation performance of these models on standard UDA datasets and aerial UDA datasets. In addition, we modernize the training procedure commonly seen in UDA tasks by adding image augmentation techniques to help models generate richer features. Our results show that ConvNext and SWIN offer the best performance, indicating that the attention mechanism is very beneficial for domain generalization and adaptation with both transformer and convolutional architectures. Our ablation study shows that our modernized training recipe, within the SHOT framework, significantly boosts performance on aerial datasets.

摘要

无监督域适应(UDA)旨在减轻由于训练数据集和测试数据集之间的分布差异而导致的性能下降。UDA方法已使在有标签数据的源域上训练的模型到仅有无标签数据的目标域上训练的模型的性能得到提升。域适应中的标准特征提取方法一直是卷积神经网络(CNN)。最近,基于注意力的Transformer模型已成为计算机视觉任务的有效替代方案。在本文中,我们将三种基于注意力的架构,即视觉Transformer(ViT)、移位窗口Transformer(SWIN)和双注意力视觉Transformer(DAViT),与卷积架构ResNet、HRNet以及基于注意力的ConvNext进行基准测试,以评估不同骨干网络在域泛化和适应方面的性能。我们将这些骨干架构作为特征提取器纳入用于UDA的源假设转移(SHOT)框架中。SHOT利用在源域中学习到的知识,在没有源域数据的情况下,通过自监督深度特征聚类和自训练来对齐无标签目标数据的图像特征。我们分析了这些模型在标准UDA数据集和航空UDA数据集上的泛化和适应性能。此外,我们通过添加图像增强技术来改进UDA任务中常见的训练过程,以帮助模型生成更丰富的特征。我们的结果表明,ConvNext和SWIN表现最佳,这表明注意力机制对于Transformer和卷积架构的域泛化和适应都非常有益。我们的消融研究表明,在SHOT框架内,我们改进后的训练方法显著提高了航空数据集上的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0b7/10611075/653d9e96ad6c/sensors-23-08409-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验