解耦自我视角运动以预测行人轨迹和意图。

Decouple Ego-View Motions for Predicting Pedestrian Trajectory and Intention.

作者信息

Zhang Zhengming, Ding Zhengming, Tian Renran

出版信息

IEEE Trans Image Process. 2024;33:4716-4727. doi: 10.1109/TIP.2024.3445734. Epub 2024 Aug 30.

DOI:10.1109/TIP.2024.3445734

Abstract

Pedestrian trajectory prediction is a critical component of autonomous driving in urban environments, allowing vehicles to anticipate pedestrian movements and facilitate safer interactions. While egocentric-view-based algorithms can reduce the sensing and computation burdens of 3D scene reconstruction, accurately predicting pedestrian trajectories and interpreting their intentions from this perspective requires a better understanding of the coupled vehicle (camera) and pedestrian motions, which has not been adequately addressed by existing models. In this paper, we present a novel egocentric pedestrian trajectory prediction approach that uses a two-tower structure and multi-modal inputs. One tower, the vehicle module, receives only the initial pedestrian position and ego-vehicle actions and speed, while the other, the pedestrian module, receives additional prior pedestrian trajectory and visual features. Our proposed action-aware loss function allows the two-tower model to decompose pedestrian trajectory predictions into two parts, caused by ego-vehicle movement and pedestrian movement, respectively, even when only trained on combined ego-view motions. This decomposition increases model flexibility and provides a better estimation of pedestrian actions and intentions, enhancing overall performance. Experiments on three publicly available benchmark datasets show that our proposed model outperforms all existing algorithms in ego-view pedestrian trajectory prediction accuracy.

摘要

行人轨迹预测是城市环境中自动驾驶的关键组成部分，它能让车辆预测行人的行动，促进更安全的交互。虽然基于自我视角的算法可以减轻三维场景重建的感知和计算负担，但从这个角度准确预测行人轨迹并解读其意图需要更好地理解车辆（摄像头）与行人运动的耦合关系，而现有模型尚未充分解决这一问题。在本文中，我们提出了一种新颖的基于自我视角的行人轨迹预测方法，该方法采用双塔结构和多模态输入。一个塔，即车辆模块，仅接收行人的初始位置以及自身车辆的动作和速度，而另一个塔，即行人模块，则接收额外的行人先验轨迹和视觉特征。我们提出的动作感知损失函数允许双塔模型将行人轨迹预测分解为两个部分，分别由自身车辆运动和行人运动引起，即使该模型仅在组合的自我视角运动上进行训练。这种分解提高了模型的灵活性，并能更好地估计行人的动作和意图，从而提升整体性能。在三个公开可用的基准数据集上进行的实验表明，我们提出的模型在自我视角行人轨迹预测准确性方面优于所有现有算法。