空间注意束：一种关注被遮挡物体的 3D 目标检测方法。

Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects.

机构信息

School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China.

出版信息

Sensors (Basel). 2022 Mar 18;22(6):2366. doi: 10.3390/s22062366.

DOI:10.3390/s22062366

PMID:35336536

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8955271/

Abstract

Achieving the accurate perception of occluded objects for autonomous vehicles is a challenging problem. Human vision can always quickly locate important object regions in complex external scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning ability when faced with occluded objects with only a partial structure, we propose a local feature aggregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.

摘要

实现自动驾驶车辆对被遮挡物体的准确感知是一个具有挑战性的问题。人类视觉总能快速定位复杂外部场景中的重要目标区域，而其他区域则只是粗略分析或忽略，这被定义为视觉注意机制。然而，自动驾驶车辆的感知系统并不知道点云中的哪一部分在感兴趣的区域内。因此，探索如何在自动驾驶的感知系统中使用视觉注意机制是有意义的。在本文中，我们提出了空间注意视锥体的模型，以解决 3D 目标检测中的物体遮挡问题。空间注意视锥体可以抑制不重要的特征，并将有限的神经计算资源分配给场景的关键部分，从而为更高层次的感知推理任务提供更大的相关性和更简单的处理。为了确保我们的方法在仅具有部分结构的遮挡物体面前保持良好的推理能力，我们提出了局部特征聚合模块，以捕获点云中更复杂的局部特征。最后，我们讨论了 3D 边界框和 2D 边界框之间的投影约束关系，并提出了联合锚框投影损失函数，这将有助于提高我们方法的整体性能。KITTI 数据集的结果表明，我们提出的方法可以有效地提高遮挡物体的检测精度。我们的方法在汽车类别的易、中、难三个难度级别上的检测精度分别达到了 89.46%、79.91%和 75.53%，在遮挡程度较高的困难级别上尤其取得了 6.97%的性能提升。我们的单阶段方法不需要依赖另一个精炼阶段，与两阶段方法的精度相当。