Suppr超能文献

动态视觉语言导航(DynamicVLN):将动态因素融入视觉与语言导航场景

DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios.

作者信息

Sun Yanjun, Qiu Yue, Aoki Yoshimitsu

机构信息

Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.

National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba 305-8560, Japan.

出版信息

Sensors (Basel). 2025 Jan 9;25(2):364. doi: 10.3390/s25020364.

Abstract

Traditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedestrian activity, and weather variations are dynamic and continually changing. These factors significantly impact an agent's decision-making ability, underscoring the limitations of current VLN models, which do not accurately reflect the complexities of real-world navigation. To bridge this gap, we propose a novel task called Dynamic Vision-and-Language Navigation (DynamicVLN), incorporating various dynamic scenarios to enhance the agent's decision-making abilities and adaptability. By redefining the VLN task, we emphasize that a robust and generalizable agent should not rely solely on predefined instructions but must also demonstrate reasoning skills and adaptability to unforeseen events. Specifically, we have designed ten scenarios that simulate the challenges of dynamic navigation and developed a dedicated dataset of 11,261 instances using the CARLA simulator (ver.0.9.13) and large language model to provide realistic training conditions. Additionally, we introduce a baseline model that integrates advanced perception and decision-making modules, enabling effective navigation and interpretation of the complexities of dynamic road conditions. This model showcases the ability to follow natural language instructions while dynamically adapting to environmental cues. Our approach establishes a benchmark for developing agents capable of functioning in real-world, dynamic environments and extending beyond the limitations of static VLN tasks to more practical and versatile applications.

摘要

传统的视觉与语言导航(VLN)任务要求智能体使用自然语言指令在静态环境中导航。然而,诸如车辆行驶、交通信号波动、行人活动和天气变化等现实世界道路状况是动态且不断变化的。这些因素显著影响智能体的决策能力,凸显了当前VLN模型的局限性,即它们无法准确反映现实世界导航的复杂性。为了弥合这一差距,我们提出了一种名为动态视觉与语言导航(DynamicVLN)的新任务,纳入各种动态场景以增强智能体的决策能力和适应性。通过重新定义VLN任务,我们强调一个强大且可推广的智能体不应仅依赖预定义指令,还必须展示推理技能以及对意外事件的适应性。具体而言,我们设计了十个模拟动态导航挑战的场景,并使用CARLA模拟器(版本0.9.13)和大语言模型开发了一个包含11261个实例的专用数据集,以提供逼真的训练条件。此外,我们引入了一个集成先进感知和决策模块的基线模型,使其能够有效导航并解读动态道路状况的复杂性。该模型展示了在动态适应环境线索的同时遵循自然语言指令的能力。我们的方法为开发能够在现实世界动态环境中运行的智能体建立了一个基准,并超越了静态VLN任务的局限性,实现更实际和通用的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df2c/11768887/bb872317b8b8/sensors-25-00364-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验