Yao Xinpeng, Liu Peiyuan, Zhou Jingmei, Wang Zijian, Fan Songhua, Wang Yuchen
Shandong Key Laboratory of Smart Transportation (Preparation), Jinan, China.
School of Electronics and Control Engineering, Chang'an University, Xi'an, China.
PLoS One. 2025 Jun 27;20(6):e0325373. doi: 10.1371/journal.pone.0325373. eCollection 2025.
Aiming at the problem that small and irregular detection targets such as cyclists have low detection accuracy and inaccurate recognition by existing 3D target detection algorithms, MAT-PointPillars (Multi-scale Attention and Transformer PointPillars), a 3D object detection algorithm, extends PointPillars with multi-scale vision Transformers and attention mechanisms. First, the algorithm employs pillar coding for semantic point cloud encoding and introduces an attention mechanism to refine the backbone's upsampling process. Furthermore, the Transformer Encoder is introduced to improve the upsampling structure of the third stage of the backbone. On the KITTI dataset, our algorithm achieved 3D average detection accuracy (AP3D) of 81.15%, 62.02%, and 58.68% across three difficulty levels. Compared with the baseline model, the proposed algorithm improves AP3D by 2.44%, 1.19%, and 1.23% respectively. The real-time 3D object detection system is built based on ROS, and average running frames per second of the system is 22.63, which is higher than the sampling frequency of conventional LiDAR. By ensuring sufficient detection speed, the MAT-PointPillars algorithm can increase detection accuracy of cyclists in real-world scenarios.
针对现有3D目标检测算法对诸如骑自行车者等小而不规则检测目标检测精度低且识别不准确的问题,一种3D目标检测算法MAT-PointPillars(多尺度注意力与Transformer的PointPillars)通过多尺度视觉Transformer和注意力机制对PointPillars进行了扩展。首先,该算法采用柱体编码进行语义点云编码,并引入注意力机制来优化主干网络的上采样过程。此外,引入Transformer编码器以改进主干网络第三阶段的上采样结构。在KITTI数据集上,我们的算法在三个难度级别上分别实现了81.15%、62.02%和58.68%的3D平均检测精度(AP3D)。与基线模型相比,所提算法的AP3D分别提高了2.44%、1.19%和1.23%。基于ROS构建了实时3D目标检测系统,系统的平均每秒运行帧数为22.63,高于传统激光雷达的采样频率。通过确保足够的检测速度,MAT-PointPillars算法可以提高现实场景中骑自行车者的检测精度。