• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过元学习实现视觉与语言导航的视觉感知泛化

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning.

作者信息

Wang Ting, Wu Zongkai, Wang Donglin

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5193-5199. doi: 10.1109/TNNLS.2021.3122579. Epub 2023 Aug 4.

DOI:10.1109/TNNLS.2021.3122579
PMID:34780332
Abstract

Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received in real time. Prior works have implemented VLN tasks on continuous environments or physical robots, all of which use a fixed-camera configuration due to the limitations of datasets, such as 1.5-m height, 90° horizontal field of view (HFOV), and so on. However, real-life robots with different purposes have multiple camera configurations, and the huge gap in visual information makes it difficult to directly transfer the learned navigation skills between various robots. In this brief, we propose a visual perception generalization strategy based on meta-learning, which enables the agent to fast adapt to a new camera configuration. In the training phase, we first locate the generalization problem to the visual perception module and then compare two meta-learning algorithms for better generalization in seen and unseen environments. One of them uses the model-agnostic meta-learning (MAML) algorithm that requires few-shot adaptation, and the other refers to a metric-based meta-learning method with a feature-wise affine transformation (AT) layer. The experimental results on the VLN-CE dataset demonstrate that our strategy successfully adapts the learned navigation skills to new camera configurations, and the two algorithms show their advantages in seen and unseen environments respectively.

摘要

视觉与语言导航(VLN)是一项具有挑战性的任务,它要求智能体通过理解自然语言指令和实时接收到的视觉信息在现实世界环境中进行导航。先前的工作已经在连续环境或物理机器人上实现了VLN任务,由于数据集的限制,例如1.5米的高度、90°的水平视野(HFOV)等,所有这些都使用固定的相机配置。然而,具有不同用途的现实生活中的机器人有多种相机配置,并且视觉信息的巨大差异使得难以在各种机器人之间直接转移所学的导航技能。在本简报中,我们提出了一种基于元学习的视觉感知泛化策略,该策略使智能体能够快速适应新的相机配置。在训练阶段,我们首先将泛化问题定位到视觉感知模块,然后比较两种元学习算法,以便在已见和未见环境中实现更好的泛化。其中一种使用需要少样本适应的模型无关元学习(MAML)算法,另一种指的是具有特征-wise仿射变换(AT)层的基于度量的元学习方法。在VLN-CE数据集上的实验结果表明,我们的策略成功地将所学的导航技能适应于新的相机配置,并且这两种算法分别在已见和未见环境中展现出了优势。

相似文献

1
Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning.通过元学习实现视觉与语言导航的视觉感知泛化
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5193-5199. doi: 10.1109/TNNLS.2021.3122579. Epub 2023 Aug 4.
2
Vision-Language Navigation Policy Learning and Adaptation.视觉-语言导航策略学习与适应。
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4205-4216. doi: 10.1109/TPAMI.2020.2972281. Epub 2021 Nov 3.
3
Learning to Forget for Meta-Learning via Task-and-Layer-Wise Attenuation.通过任务和层衰减学习元学习遗忘。
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7718-7730. doi: 10.1109/TPAMI.2021.3102098. Epub 2022 Oct 4.
4
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments.ETPNav:连续环境中视觉语言导航的演进拓扑规划
IEEE Trans Pattern Anal Mach Intell. 2024 Apr 9;PP. doi: 10.1109/TPAMI.2024.3386695.
5
Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning.通过扰动感知对比学习实现抗偏差智能体导航
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12535-12549. doi: 10.1109/TPAMI.2023.3273594. Epub 2023 Sep 5.
6
Outdoor Vision-and-Language Navigation Needs Object-Level Alignment.户外视觉与语言导航需要目标级对齐。
Sensors (Basel). 2023 Jun 29;23(13):6028. doi: 10.3390/s23136028.
7
Unbiased Model-Agnostic Metalearning Algorithm for Learning Target-Driven Visual Navigation Policy.无偏模型不可知元学习算法,用于学习目标驱动的视觉导航策略。
Comput Intell Neurosci. 2021 Dec 8;2021:5620751. doi: 10.1155/2021/5620751. eCollection 2021.
8
Correctable Landmark Discovery via Large Models for Vision-Language Navigation.通过大型模型进行视觉语言导航的可校正地标发现
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8534-8548. doi: 10.1109/TPAMI.2024.3407759. Epub 2024 Nov 6.
9
Generalization Enhancement of Visual Reinforcement Learning through Internal States.通过内部状态增强视觉强化学习的泛化能力
Sensors (Basel). 2024 Jul 12;24(14):4513. doi: 10.3390/s24144513.
10
Learning to Learn Task-Adaptive Hyperparameters for Few-Shot Learning.为少样本学习学习任务自适应超参数
IEEE Trans Pattern Anal Mach Intell. 2024 Mar;46(3):1441-1454. doi: 10.1109/TPAMI.2023.3261387. Epub 2024 Feb 6.