Xu Wei, Du Xiaodong, Li Ruochen, Li Bingjie, Jiao Yuhu, Xing Lei
Shandong University of Science and Technology, College of Transportation, Qingdao, 266590, China.
Sci Rep. 2025 May 20;15(1):17472. doi: 10.1038/s41598-025-99524-5.
While multi-object tracking is critical for autonomous driving systems, traditional algorithms exhibit three fundamental limitations in complex scenarios: (1) blurred feature representation under occlusion and re-identification scenarios causing identity switches, (2) insufficient sensitivity to scale-variant targets due to fixed geometric constraints in conventional IoU-based loss functions, and (3) gradient degradation in deep convolutional layers hindering discriminative feature learning. To address these challenges, we propose AE-StrongSORT (Attention-Enhanced StrongSORT), an attention-enhanced tracking framework featuring three systematic innovations: first, the GAM-YOLO (global attention mechanism-YOLO)hybrid architecture integrates multi-scale feature fusion with a global attention mechanism (GC2f structure). This design enhances cross-dimensional feature interaction through localized channel-spatial attention gates, significantly improving occlusion-resistant feature representation (IDF1 ↑ 9.99%, IDsw ↓ 9.85%). Second, the F-EIoU loss function introduces dynamic size-dependent penalty terms and difficulty-adaptive weighting factors, effectively balancing learning priorities between small targets and normal instances. Third, the optimized CBH-Conv module employs Hardswish activation and depthwise separable convolution to mitigate gradient vanishing while maintaining real-time efficiency (achieving a 17% MOTA improvement at 213 FPS).Evaluated on the MOT-16 dataset, AE-StrongSORT demonstrates substantial improvements over the baseline StrongSORT, with 17%, 2.78%, and 9.99% gains in MOTA, HOTA, and IDF1 metrics respectively, alongside significant reductions in false/missed detections. These advances establish a novel technical pathway for robust vehicle tracking in real-world traffic scenarios characterized by coexisting challenges of scale variation, motion blur, and dense occlusion.
虽然多目标跟踪对于自动驾驶系统至关重要,但传统算法在复杂场景中存在三个基本局限性:(1)在遮挡和重新识别场景下特征表示模糊,导致身份切换;(2)由于基于传统交并比(IoU)的损失函数中固定的几何约束,对尺度变化目标的敏感性不足;(3)深度卷积层中的梯度退化阻碍了判别性特征学习。为应对这些挑战,我们提出了AE-StrongSORT(注意力增强的StrongSORT),这是一个具有三项系统性创新的注意力增强跟踪框架:首先,GAM-YOLO(全局注意力机制-YOLO)混合架构将多尺度特征融合与全局注意力机制(GC2f结构)相结合。这种设计通过局部通道-空间注意力门增强了跨维度特征交互,显著提高了抗遮挡特征表示(IDF1提升9.99%,IDsw下降9.85%)。其次,F-EIoU损失函数引入了动态的尺寸相关惩罚项和难度自适应加权因子,有效平衡了小目标和正常实例之间的学习优先级。第三,优化后的CBH-Conv模块采用Hardswish激活函数和深度可分离卷积来减轻梯度消失,同时保持实时效率(在213帧每秒时MOTA提高17%)。在MOT-16数据集上进行评估时,AE-StrongSORT相较于基线StrongSORT有显著改进,在MOTA、HOTA和IDF1指标上分别提高了17%、2.78%和9.99%,同时显著减少了误检/漏检。这些进展为在存在尺度变化、运动模糊和密集遮挡等共存挑战的现实交通场景中进行稳健的车辆跟踪建立了一条新的技术途径。