Xie Liang, Zhang Meishan, Li You, Qin Wei, Yan Ye, Yin Erwei
IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):1352-1363. doi: 10.1109/TNNLS.2022.3183287. Epub 2024 Jan 4.
Vision-language navigation (VLN) is a challenging task, which guides an agent to navigate in a realistic environment by natural language instructions. Sequence-to-sequence modeling is one of the most prospective architectures for the task, which achieves the agent navigation goal by a sequence of moving actions. The line of work has led to the state-of-the-art performance. Recently, several studies showed that the beam-search decoding during the inference can result in promising performance, as it ranks multiple candidate trajectories by scoring each trajectory as a whole. However, the trajectory-level score might be seriously biased during ranking. The score is a simple averaging of individual unit scores of the target-sequence actions, and these unit scores could be incomparable among different trajectories since they are calculated by a local discriminant classifier. To address this problem, we propose a global normalization strategy to rescale the scores at the trajectory level. Concretely, we present two global score functions to rerank all candidates in the output beam, resulting in more comparable trajectory scores. In this way, the bias problem can be greatly alleviated. We conduct experiments on the benchmark room-to-room (R2R) dataset of VLN to verify our method, and the results show that the proposed global method is effective, providing significant performance than the corresponding baselines. Our final model can achieve competitive performance on the VLN leaderboard.
视觉语言导航(VLN)是一项具有挑战性的任务,它通过自然语言指令引导智能体在现实环境中导航。序列到序列建模是该任务最具前景的架构之一,它通过一系列移动动作实现智能体的导航目标。这一系列工作已经取得了当前最优的性能。最近,一些研究表明,推理过程中的束搜索解码可以带来不错的性能,因为它通过对每个轨迹进行整体评分来对多个候选轨迹进行排序。然而,在排序过程中,轨迹级别的分数可能会存在严重偏差。该分数是目标序列动作的各个单元分数的简单平均,并且由于这些单元分数是由局部判别分类器计算得出的,所以不同轨迹之间的这些单元分数可能无法比较。为了解决这个问题,我们提出了一种全局归一化策略,在轨迹级别重新调整分数。具体来说,我们提出了两个全局分数函数,对输出束中的所有候选轨迹进行重新排序,从而得到更具可比性的轨迹分数。通过这种方式,偏差问题可以得到极大缓解。我们在VLN的基准逐室(R2R)数据集上进行实验以验证我们的方法,结果表明所提出的全局方法是有效的,比相应的基线方法具有显著的性能提升。我们的最终模型在VLN排行榜上可以取得有竞争力的性能。