School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shanxi, China.
School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shanxi, China; Institute of Artificial Intelligence and Robotics, College of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, 710049, Shanxi, China.
Neural Netw. 2022 Apr;148:219-231. doi: 10.1016/j.neunet.2022.01.015. Epub 2022 Jan 29.
Background noise and scale variation are common problems that have been long recognized in crowd counting. Humans glance at a crowd image and instantly know the approximate number of human and where they are through attention the crowd regions and the congestion degree of crowd regions with a global receptive field. Hence, in this paper, we propose a novel feedback network with Region-Aware block called RANet by modeling human's Top-Down visual perception mechanism. Firstly, we introduce a feedback architecture to generate priority maps that provide prior about candidate crowd regions in input images. The prior enables the RANet pay more attention to crowd regions. Then we design Region-Aware block that could adaptively encode the contextual information into input images through global receptive field. More specifically, we scan the whole input images and its priority maps in the form of column vector to obtain a relevance matrix estimating their similarity. The relevance matrix obtained would be utilized to build global relationships between pixels. Our method outperforms state-of-the-art crowd counting methods on several public datasets.
背景噪声和尺度变化是人群计数中长期存在的问题。人类只需扫一眼人群图像,就能通过关注人群区域和人群区域的拥挤程度,用全局感受野来快速估算出人群的大概数量和位置。因此,在本文中,我们通过模拟人类的自上而下的视觉感知机制,提出了一种新的具有区域感知模块的反馈网络,称为 RANet。首先,我们引入了一种反馈架构来生成优先级图,为输入图像中的候选人群区域提供先验信息。这种先验信息使得 RANet 能够更加关注人群区域。然后,我们设计了区域感知模块,它可以通过全局感受野自适应地将上下文信息编码到输入图像中。具体来说,我们以列向量的形式扫描整个输入图像及其优先级图,以获得一个相关性矩阵来估计它们的相似性。所得到的相关性矩阵将用于建立像素之间的全局关系。我们的方法在几个公共数据集上优于现有的人群计数方法。