Suppr超能文献

AD-DETR:拥挤场景中具有不对称关系和解耦注意力的DETR

AD-DETR: DETR with asymmetrical relation and decoupled attention in crowded scenes.

作者信息

Huang Yueming, Yuan Guowu

机构信息

School of Information Science and Engineering, Yunnan University, Kunming 650504, China.

Yunnan Key Laboratory of Intelligent Systems and Computing, Kunming 650504, China.

出版信息

Math Biosci Eng. 2023 Jun 26;20(8):14158-14179. doi: 10.3934/mbe.2023633.

Abstract

Pedestrian detection in crowded scenes is widely used in computer vision. However, it still has two difficulties: 1) eliminating repeated predictions (multiple predictions corresponding to the same object); 2) false detection and missing detection due to the high scene occlusion rate and the small visible area of detected pedestrians. This paper presents a detection framework based on DETR (detection transformer) to address the above problems, and the model is called AD-DETR (asymmetrical relation detection transformer). We find that the symmetry in a DETR framework causes synchronous prediction updates and duplicate predictions. Therefore, we propose an asymmetric relationship fusion mechanism and let each query asymmetrically fuse the relative relationships of surrounding predictions to learn to eliminate duplicate predictions. Then, we propose a decoupled cross-attention head that allows the model to learn to restrict the range of attention to focus more on visible regions and regions that contribute more to confidence. The method can reduce the noise information introduced by the occluded objects to reduce the false detection rate. Meanwhile, in our proposed asymmetric relations module, we establish a way to encode the relative relation between sets of attention points and improve the baseline. Without additional annotations, combined with the deformable-DETR with Res50 as the backbone, our method can achieve an average precision of 92.6%, MR$ ^{-2} $ of 40.0% and Jaccard index of 84.4% on the challenging CrowdHuman dataset. Our method exceeds previous methods, such as Iter-E2EDet (progressive end-to-end object detection), MIP (one proposal, multiple predictions), etc. Experiments show that our method can significantly improve the performance of the query-based model for crowded scenes, and it is highly robust for the crowded scene.

摘要

行人检测在拥挤场景中的应用在计算机视觉领域已十分广泛。然而,它仍存在两个难点:1)消除重复预测(同一物体对应多个预测结果);2)由于场景遮挡率高以及被检测行人的可见区域小而导致的误检和漏检。本文提出了一种基于DETR(检测变换器)的检测框架来解决上述问题,该模型被称为AD - DETR(非对称关系检测变换器)。我们发现DETR框架中的对称性会导致同步预测更新和重复预测。因此,我们提出了一种非对称关系融合机制,让每个查询非对称地融合周围预测的相对关系,以学习消除重复预测。然后,我们提出了一种解耦的交叉注意力头,使模型能够学习限制注意力范围,更多地关注可见区域和对置信度贡献更大的区域。该方法可以减少被遮挡物体引入的噪声信息,从而降低误检率。同时,在我们提出的非对称关系模块中,我们建立了一种对注意力点集之间的相对关系进行编码的方法,并改进了基线。在没有额外注释的情况下,结合以Res50为骨干的可变形DETR,我们的方法在具有挑战性的CrowdHuman数据集上可以达到92.6%的平均精度、40.0%的MR$ ^{-2} $和84.4%的杰卡德指数。我们的方法超越了先前的方法,如Iter - E2EDet(渐进式端到端目标检测)、MIP(一个提议,多个预测)等。实验表明,我们的方法可以显著提高基于查询的拥挤场景模型的性能,并且对拥挤场景具有高度鲁棒性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验