IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9056-9072. doi: 10.1109/TPAMI.2021.3124956. Epub 2022 Nov 7.
To simultaneously estimate the number of heads and locate heads with bounding boxes, we resort to detection-based crowd counting by leveraging RGB-D data and design a dual-path guided detection network (DPDNet). Specifically, to improve the performance of detection-based approaches for dense/tiny heads, we propose a density map guided detection module, which leverages density map to improve the head/non-head classification in detection network where the density implies the probability of a pixel being a head, and a depth-adaptive kernel that considers the variances in head sizes is also introduced to generate high-fidelity density map for more robust density map regression. In order to prevent dense heads from being filtered out during post-processing, we utilize such a density map for post-processing of head detection and propose a density map guided NMS strategy. Meanwhile, to improve the ability of detecting small heads, we also propose a depth-guided detection module to generate a dynamic dilated convolution to extract features of heads of different scales, and a depth-aware anchor is further designed for better initialization of anchor sizes in the detection framework. Then we use the bounding boxes whose sizes are generated with depth to train our DPDNet. Considering that existing RGB-D datasets are too small and not suitable for performance evaluation of data-driven based approaches, we collect two large-scale RGB-D crowd counting datasets, which comprise a synthetic dataset and a real-world dataset, respectively. Since the depth value at long-distance positions cannot be obtained in the real-world dataset, we further propose a depth completion method with meta learning, which fully utilizes the synthetic depth data to complete the depth value at long-distance positions. Extensive experiments on our proposed two RGB-D datasets and the MICC RGB-D counting dataset show that our method achieves the best performance for RGB-D crowd counting and localization. Further, our method can be easily extended to RGB image based crowd counting and achieves comparable or even better performance on the RGB datasets for both head counting and localization.
为了同时估计人头数量并定位带有边界框的人头,我们借助 RGB-D 数据采用基于检测的方法进行人群计数,并设计了一种双通道引导检测网络(DPDNet)。具体来说,为了提高基于检测的方法在密集/微小人头检测方面的性能,我们提出了一种密度图引导的检测模块,该模块利用密度图来改进检测网络中的人头/非人头分类,其中密度表示一个像素为人头的概率,还引入了一个深度自适应核,用于生成更准确的密度图,以便更稳健地进行密度图回归。为了防止密集的人头在后期处理中被过滤掉,我们利用这种密度图进行人头检测的后期处理,并提出了一种密度图引导的 NMS 策略。同时,为了提高检测小个头的能力,我们还提出了一种深度引导的检测模块,用于生成动态扩张卷积,以提取不同尺度人头的特征,并进一步设计了深度感知锚点,以更好地初始化检测框架中的锚点大小。然后,我们使用大小由深度生成的边界框来训练我们的 DPDNet。考虑到现有的 RGB-D 数据集太小,不适合基于数据驱动的方法的性能评估,我们收集了两个大规模的 RGB-D 人群计数数据集,分别由一个合成数据集和一个真实世界数据集组成。由于在真实世界数据集中无法获得远距离位置的深度值,我们进一步提出了一种基于元学习的深度补全方法,该方法充分利用合成深度数据来完成远距离位置的深度值。在我们提出的两个 RGB-D 数据集和 MICC RGB-D 计数数据集上进行的广泛实验表明,我们的方法在 RGB-D 人群计数和定位方面取得了最佳性能。此外,我们的方法可以很容易地扩展到基于 RGB 图像的人群计数,并在 RGB 数据集上实现了人头计数和定位方面的可比甚至更好的性能。