Tang Shuyuan, Zhou Yiqing, Li Jintao, Liu Chang, Shi Jinglin
State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China.
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
Sensors (Basel). 2024 Sep 30;24(19):6350. doi: 10.3390/s24196350.
Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded occlusion patterns critically limit the efficacy of both one-stage and two-stage pedestrian detectors, leading to suboptimal detection performance. To address this, we introduce a novel architecture termed the Attention-Guided Feature Enhancement Network (AGFEN), designed within the deep convolutional neural network framework. AGFEN improves the semantic information of high-level features by mapping it onto low-level feature details through sampling, creating an effect comparable to mask modulation. This technique enhances both channel-level and spatial-level features concurrently without incurring additional annotation costs. Furthermore, we transition from a traditional one-to-one correspondence between proposals and predictions to a one-to-multiple paradigm, facilitating non-maximum suppression using the prediction set as the fundamental unit. Additionally, we integrate these methodologies by aggregating local features between regions of interest (RoI) through the reuse of classification weights, effectively mitigating false positives. Our experimental evaluations on three widely used datasets demonstrate that AGFEN achieves a 2.38% improvement over the baseline detector on the CrowdHuman dataset, underscoring its effectiveness and potential for advancing pedestrian detection technologies.
遮挡是利用计算机视觉的行人检测技术发展中的一个主要障碍。这一挑战既包括由环境物体遮挡行人造成的类间遮挡,也包括行人之间相互作用导致的类内遮挡。在复杂多变的城市环境中,这些复合的遮挡模式严重限制了单阶段和双阶段行人检测器的效能,导致检测性能次优。为了解决这个问题,我们引入了一种名为注意力引导特征增强网络(AGFEN)的新颖架构,它是在深度卷积神经网络框架内设计的。AGFEN通过采样将高级特征的语义信息映射到低级特征细节上,从而改善高级特征的语义信息,产生类似于掩码调制的效果。该技术在不产生额外标注成本的情况下,同时增强了通道级和空间级特征。此外,我们从提议与预测之间传统的一对一对应关系转变为一对多范式,以预测集作为基本单元来促进非极大值抑制。此外,我们通过重用分类权重在感兴趣区域(RoI)之间聚合局部特征来整合这些方法,有效减轻误报。我们在三个广泛使用的数据集上的实验评估表明,AGFEN在CrowdHuman数据集上比基线检测器提高了2.38%,突出了其在推进行人检测技术方面的有效性和潜力。