Suppr超能文献

使用车辆车道解缠条件变分自编码器的多任务轨迹预测

Multi-Task Trajectory Prediction Using a Vehicle-Lane Disentangled Conditional Variational Autoencoder.

作者信息

Chen Haoyang, Li Na, Shan Hangguan, Liu Eryun, Xiang Zhiyu

机构信息

The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China.

出版信息

Sensors (Basel). 2025 Jul 20;25(14):4505. doi: 10.3390/s25144505.

Abstract

Trajectory prediction under multimodal information is critical for autonomous driving, necessitating the integration of dynamic vehicle states and static high-definition (HD) maps to model complex agent-scene interactions effectively. However, existing methods often employ static scene encodings and unstructured latent spaces, limiting their ability to capture evolving spatial contexts and produce diverse yet contextually coherent predictions. To tackle these challenges, we propose , a novel generative framework that introduces (1) a time-aware scene encoder that aligns HD map features with vehicle motion to capture evolving scene semantics and (2) a structured latent model that explicitly disentangles agent-specific intent and scene-level constraints. Additionally, we introduce an auxiliary lane prediction task to provide targeted supervision for scene understanding and improve latent variable learning. Our approach jointly predicts future trajectories and lane sequences, enabling more interpretable and scene-consistent forecasts. Extensive evaluations on the nuScenes dataset demonstrate the effectiveness of MS-SLV, achieving a 12.37% reduction in average displacement error and a 7.67% reduction in final displacement error over state-of-the-art methods. Moreover, MS-SLV significantly improves multi-modal prediction, reducing the top-5 Miss Rate (MR5) and top-10 Miss Rate (MR10) by 26% and 33%, respectively, and lowering the Off-Road Rate (ORR) by 3%, as compared with the strongest baseline in our evaluation.

摘要

多模态信息下的轨迹预测对于自动驾驶至关重要,这需要整合动态车辆状态和静态高清(HD)地图,以有效地对复杂的智能体-场景交互进行建模。然而,现有方法通常采用静态场景编码和无结构的潜在空间,限制了它们捕捉不断演变的空间上下文以及生成多样化但上下文连贯的预测的能力。为应对这些挑战,我们提出了MS-SLV,这是一个新颖的生成框架,它引入了(1)一个时间感知场景编码器,该编码器将高清地图特征与车辆运动对齐,以捕捉不断演变的场景语义,以及(2)一个结构化潜在模型,该模型明确地解开特定智能体的意图和场景级约束。此外,我们引入了一个辅助车道预测任务,为场景理解提供有针对性的监督,并改善潜在变量学习。我们的方法联合预测未来轨迹和车道序列,实现更具可解释性和场景一致性的预测。在nuScenes数据集上的广泛评估证明了MS-SLV的有效性,与最先进的方法相比,平均位移误差降低了12.37%,最终位移误差降低了7.67%。此外,与我们评估中最强的基线相比,MS-SLV显著改善了多模态预测,分别将前5误报率(MR5)和前10误报率(MR10)降低了26%和33%,并将越野率(ORR)降低了3%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7436/12298317/7620b7b30d64/sensors-25-04505-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验