Suppr超能文献

通过元学习实现视觉与语言导航的视觉感知泛化

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning.

作者信息

Wang Ting, Wu Zongkai, Wang Donglin

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5193-5199. doi: 10.1109/TNNLS.2021.3122579. Epub 2023 Aug 4.

Abstract

Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received in real time. Prior works have implemented VLN tasks on continuous environments or physical robots, all of which use a fixed-camera configuration due to the limitations of datasets, such as 1.5-m height, 90° horizontal field of view (HFOV), and so on. However, real-life robots with different purposes have multiple camera configurations, and the huge gap in visual information makes it difficult to directly transfer the learned navigation skills between various robots. In this brief, we propose a visual perception generalization strategy based on meta-learning, which enables the agent to fast adapt to a new camera configuration. In the training phase, we first locate the generalization problem to the visual perception module and then compare two meta-learning algorithms for better generalization in seen and unseen environments. One of them uses the model-agnostic meta-learning (MAML) algorithm that requires few-shot adaptation, and the other refers to a metric-based meta-learning method with a feature-wise affine transformation (AT) layer. The experimental results on the VLN-CE dataset demonstrate that our strategy successfully adapts the learned navigation skills to new camera configurations, and the two algorithms show their advantages in seen and unseen environments respectively.

摘要

视觉与语言导航(VLN)是一项具有挑战性的任务,它要求智能体通过理解自然语言指令和实时接收到的视觉信息在现实世界环境中进行导航。先前的工作已经在连续环境或物理机器人上实现了VLN任务,由于数据集的限制,例如1.5米的高度、90°的水平视野(HFOV)等,所有这些都使用固定的相机配置。然而,具有不同用途的现实生活中的机器人有多种相机配置,并且视觉信息的巨大差异使得难以在各种机器人之间直接转移所学的导航技能。在本简报中,我们提出了一种基于元学习的视觉感知泛化策略,该策略使智能体能够快速适应新的相机配置。在训练阶段,我们首先将泛化问题定位到视觉感知模块,然后比较两种元学习算法,以便在已见和未见环境中实现更好的泛化。其中一种使用需要少样本适应的模型无关元学习(MAML)算法,另一种指的是具有特征-wise仿射变换(AT)层的基于度量的元学习方法。在VLN-CE数据集上的实验结果表明,我们的策略成功地将所学的导航技能适应于新的相机配置,并且这两种算法分别在已见和未见环境中展现出了优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验