Suppr超能文献

基于感知一致性匹配的视频域自适应语义分割。

Video domain adaptation for semantic segmentation using perceptual consistency matching.

机构信息

Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea; Division of Intelligent Robotics, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.

Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.

出版信息

Neural Netw. 2024 Nov;179:106505. doi: 10.1016/j.neunet.2024.106505. Epub 2024 Jul 3.

Abstract

Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite the impressive performance, existing approaches have largely focused on image-based UDA only, and video-based UDA has been relatively understudied and received less attention due to the difficulty of adapting diverse modal video features and modeling temporal associations efficiently. To address this, existing studies use optical flow to capture motion cues between in-domain consecutive frames, but is limited by heavy compute requirements and modeling flow patterns across diverse domains is equally challenging. In this work, we propose an adversarial domain adaptation approach for video semantic segmentation that aims to align temporally associated pixels in successive source and target domain frames without relying on optical flow. Specifically, we introduce a Perceptual Consistency Matching (PCM) strategy that leverages perceptual similarity to identify pixels with high correlation across consecutive frames, and infer that such pixels should correspond to the same class. Therefore, we can enhance prediction accuracy for video-UDA by enforcing consistency not only between in-domain frames, but across domains using PCM objectives during model training. Extensive experiments on public datasets show the benefit of our approach over existing state-of-the-art UDA methods. Our approach not only addresses a crucial task in video domain adaptation but also offers notable improvements in performance with faster inference times.

摘要

无监督领域自适应 (UDA) 的目标是将先前和相关的带标签数据集 (源) 中的知识转移到新的未标记数据集 (目标)。尽管现有方法取得了令人印象深刻的性能,但它们主要集中在基于图像的 UDA 上,而基于视频的 UDA 则相对研究较少,受到的关注也较少,因为难以有效地适应不同模态的视频特征和建模时间关联。为了解决这个问题,现有的研究使用光流来捕捉域内连续帧之间的运动线索,但受到计算要求高的限制,跨不同领域建模流模式也同样具有挑战性。在这项工作中,我们提出了一种用于视频语义分割的对抗性领域自适应方法,旨在在不依赖光流的情况下对齐源域和目标域连续帧中的时间关联像素。具体来说,我们引入了一种感知一致性匹配 (PCM) 策略,该策略利用感知相似性来识别连续帧之间具有高相关性的像素,并推断出这些像素应该对应于同一类。因此,我们可以通过在模型训练期间使用 PCM 目标在域内和跨域之间不仅强制一致性,从而提高视频-UDA 的预测准确性。在公共数据集上的广泛实验表明,我们的方法优于现有的最先进的 UDA 方法。我们的方法不仅解决了视频领域自适应中的关键任务,而且还提高了性能,同时具有更快的推理时间。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验