通过扰动感知对比学习实现抗偏差智能体导航

Lin Bingqian, Long Yanxin, Zhu Yi, Zhu Fengda, Liang Xiaodan, Ye Qixiang, Lin Liang

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12535-12549. doi: 10.1109/TPAMI.2023.3273594. Epub 2023 Sep 5.

Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world navigation scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human interruptions, which widely exist and may usually cause an unexpected route deviation. In this paper, we present a model-agnostic training paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER) to enhance the generalization ability of existing VLN agents to the real world, by requiring them to learn towards deviation-robust navigation. Specifically, a simple yet effective path perturbation scheme is introduced to implement the route deviation, with which the agent is required to still navigate successfully following the original instruction. Since directly enforcing the agent to learn perturbed trajectories may lead to insufficient and inefficient training, a progressively perturbed trajectory augmentation strategy is designed, where the agent can self-adaptively learn to navigate under perturbation with the improvement of its navigation performance for each specific trajectory. For encouraging the agent to well capture the difference brought by perturbation and adapt to both perturbation-free and perturbation-based environments, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts. Extensive experiments on the standard Room-to-Room (R2R) benchmark show that PROPER can benefit multiple state-of-the-art VLN baselines in perturbation-free scenarios. We further collect the perturbed path data to construct an introspection subset based on the R2R, called Path-Perturbed R2R (PP-R2R). The results on PP-R2R show unsatisfying robustness of popular VLN agents and the capability of PROPER in improving the navigation robustness under deviation.

视觉与语言导航（VLN）要求智能体遵循给定的语言指令在真实的三维环境中导航。尽管取得了显著进展，但传统的VLN智能体通常是在无干扰环境下训练的，在现实世界的导航场景中可能很容易失败，因为它们不知道如何应对各种可能的干扰，比如突然出现的障碍物或人为干扰，这些干扰广泛存在且通常会导致意外的路线偏差。在本文中，我们提出了一种与模型无关的训练范式，称为渐进扰动感知对比学习（PROPER），以提高现有VLN智能体对现实世界的泛化能力，方法是要求它们学习抗偏差的导航。具体来说，引入了一种简单而有效的路径扰动方案来实现路线偏差，要求智能体在这种情况下仍能按照原始指令成功导航。由于直接强制智能体学习受扰动的轨迹可能会导致训练不足和效率低下，因此设计了一种渐进式受扰动轨迹增强策略，在该策略中，随着智能体在每个特定轨迹上导航性能的提高，它可以自适应地学习在扰动下导航。为了鼓励智能体很好地捕捉扰动带来的差异并适应无扰动和基于扰动的环境，通过对比无扰动轨迹编码和基于扰动的对应编码，进一步开发了一种扰动感知对比学习机制。在标准的“房间到房间”（R2R）基准上进行的大量实验表明，PROPER可以在无扰动场景中使多个先进的VLN基线受益。我们进一步收集受扰动的路径数据，基于R2R构建一个自省子集，称为路径扰动R2R（PP-R2R）。在PP-R2R上的结果表明，流行的VLN智能体的鲁棒性不尽人意，而PROPER具有提高偏差情况下导航鲁棒性的能力。

相似文献

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning.通过扰动感知对比学习实现抗偏差智能体导航

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12535-12549. doi: 10.1109/TPAMI.2023.3273594. Epub 2023 Sep 5.

Correctable Landmark Discovery via Large Models for Vision-Language Navigation.通过大型模型进行视觉语言导航的可校正地标发现

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8534-8548. doi: 10.1109/TPAMI.2024.3407759. Epub 2024 Nov 6.

Vision-Language Navigation With Beam-Constrained Global Normalization.具有光束约束全局归一化的视觉语言导航

IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):1352-1363. doi: 10.1109/TNNLS.2022.3183287. Epub 2024 Jan 4.

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments.ETPNav：连续环境中视觉语言导航的演进拓扑规划

IEEE Trans Pattern Anal Mach Intell. 2024 Apr 9;PP. doi: 10.1109/TPAMI.2024.3386695.

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.HOP+：用于视觉语言导航的具有历史增强和顺序感知的预训练。

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8524-8537. doi: 10.1109/TPAMI.2023.3234243. Epub 2023 Jun 5.

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation.对抗强化指令攻击的鲁棒视觉-语言导航

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7175-7189. doi: 10.1109/TPAMI.2021.3097435. Epub 2022 Sep 14.

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning.通过元学习实现视觉与语言导航的视觉感知泛化

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5193-5199. doi: 10.1109/TNNLS.2021.3122579. Epub 2023 Aug 4.

Outdoor Vision-and-Language Navigation Needs Object-Level Alignment.户外视觉与语言导航需要目标级对齐。

Sensors (Basel). 2023 Jun 29;23(13):6028. doi: 10.3390/s23136028.

Discovering Intrinsic Subgoals for Vision- and-Language Navigation via Hierarchical Reinforcement Learning.通过分层强化学习发现视觉与语言导航的内在子目标

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6516-6528. doi: 10.1109/TNNLS.2024.3398300. Epub 2025 Apr 4.

Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation.用于视觉与语言导航的自监督3D语义表征学习

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6738-6751. doi: 10.1109/TNNLS.2024.3395633. Epub 2025 Apr 4.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning.通过扰动感知对比学习实现抗偏差智能体导航

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12535-12549. doi: 10.1109/TPAMI.2023.3273594. Epub 2023 Sep 5.

Correctable Landmark Discovery via Large Models for Vision-Language Navigation.通过大型模型进行视觉语言导航的可校正地标发现

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8534-8548. doi: 10.1109/TPAMI.2024.3407759. Epub 2024 Nov 6.

Vision-Language Navigation With Beam-Constrained Global Normalization.具有光束约束全局归一化的视觉语言导航

IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):1352-1363. doi: 10.1109/TNNLS.2022.3183287. Epub 2024 Jan 4.

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments.ETPNav：连续环境中视觉语言导航的演进拓扑规划

IEEE Trans Pattern Anal Mach Intell. 2024 Apr 9;PP. doi: 10.1109/TPAMI.2024.3386695.

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.HOP+：用于视觉语言导航的具有历史增强和顺序感知的预训练。

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8524-8537. doi: 10.1109/TPAMI.2023.3234243. Epub 2023 Jun 5.

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation.对抗强化指令攻击的鲁棒视觉-语言导航

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7175-7189. doi: 10.1109/TPAMI.2021.3097435. Epub 2022 Sep 14.

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning.通过元学习实现视觉与语言导航的视觉感知泛化

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5193-5199. doi: 10.1109/TNNLS.2021.3122579. Epub 2023 Aug 4.

Outdoor Vision-and-Language Navigation Needs Object-Level Alignment.户外视觉与语言导航需要目标级对齐。

Sensors (Basel). 2023 Jun 29;23(13):6028. doi: 10.3390/s23136028.

Discovering Intrinsic Subgoals for Vision- and-Language Navigation via Hierarchical Reinforcement Learning.通过分层强化学习发现视觉与语言导航的内在子目标

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6516-6528. doi: 10.1109/TNNLS.2024.3398300. Epub 2025 Apr 4.

Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation.用于视觉与语言导航的自监督3D语义表征学习

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6738-6751. doi: 10.1109/TNNLS.2024.3395633. Epub 2025 Apr 4.

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献