Wang Bin, Li Chao, Zhou Chao, Sun Jun
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China.
PeerJ Comput Sci. 2025 Jun 2;11:e2932. doi: 10.7717/peerj-cs.2932. eCollection 2025.
In vehicle safety detection, the accurate identification of body markers on medium and large vehicles plays a critical role in ensuring safe road travel. To address the issues of the feature and gradient information loss in previous You Only Look Once (YOLO) series models, a novel Vehicle Body Markers YOLO (VBM-YOLO) model has been designed. Firstly, the model integrates the cross-spatial-channel attention (CSCA) mechanism proposed in this study. The CSCA uses cross-dimensional information to address interaction issues during the fusion of spatial and channel dimensions, significantly enhancing the model's representational capacity. Secondly, we propose a multi-scale selective feature pyramid network (MSSFPN). By a progressive fusion approach and multi-scale feature selection learning, MSSFPN alleviates the issues of feature loss and target layer information confusion caused by traditional top-down and bottom-up feature pyramids. Finally, an auxiliary gradient branch (AGB) is proposed. During training, AGB incorporates feature information from different target layers to help the current layer retain complete gradient information. Additionally, the AGB branch does not participate in model inference, thereby reducing additional overhead. Experimental results demonstrate that VBM-YOLO improves mean average precision (mAP) by 2.3% and 4.3% at intersection over union (IoU) thresholds of 0.5 and 0.5:0.95, respectively, compared to YOLOv8s on the vehicle body markers dataset. VBM-YOLO also achieves a better balance between accuracy and computational resources than other mainstream models, exhibiting good generalization performance on public datasets like PASCAL VOC and D-Fire.
在车辆安全检测中,准确识别大中型车辆上的车身标记对于确保道路安全行驶起着至关重要的作用。为了解决此前You Only Look Once(YOLO)系列模型中特征和梯度信息丢失的问题,设计了一种新颖的车身标记YOLO(VBM-YOLO)模型。首先,该模型集成了本研究提出的跨空间通道注意力(CSCA)机制。CSCA利用跨维度信息来解决空间和通道维度融合过程中的交互问题,显著增强了模型的表征能力。其次,我们提出了一种多尺度选择性特征金字塔网络(MSSFPN)。通过渐进融合方法和多尺度特征选择学习,MSSFPN缓解了传统自上而下和自下而上特征金字塔导致的特征丢失和目标层信息混淆问题。最后,提出了一个辅助梯度分支(AGB)。在训练过程中,AGB合并来自不同目标层的特征信息,以帮助当前层保留完整的梯度信息。此外,AGB分支不参与模型推理,从而减少了额外开销。实验结果表明,在车身标记数据集上,与YOLOv8s相比,VBM-YOLO在交并比(IoU)阈值为0.5和0.5:0.95时,平均精度均值(mAP)分别提高了2.3%和4.3%。VBM-YOLO在准确性和计算资源之间也比其他主流模型实现了更好的平衡,在PASCAL VOC和D-Fire等公共数据集上表现出良好 的泛化性能。