Gao Jiaqi, Huang Zhizhong, Lei Yiming, Shan Hongming, Wang James Z, Wang Fei-Yue, Zhang Junping
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):299-312. doi: 10.1109/TNNLS.2023.3336774. Epub 2025 Jan 7.
Most conventional crowd counting methods utilize a fully-supervised learning framework to establish a mapping between scene images and crowd density maps. They usually rely on a large quantity of costly and time-intensive pixel-level annotations for training supervision. One way to mitigate the intensive labeling effort and improve counting accuracy is to leverage large amounts of unlabeled images. This is attributed to the inherent self-structural information and rank consistency within a single image, offering additional qualitative relation supervision during training. Contrary to earlier methods that utilized the rank relations at the original image level, we explore such rank-consistency relation within the latent feature spaces. This approach enables the incorporation of numerous pyramid partial orders, strengthening the model representation capability. A notable advantage is that it can also increase the utilization ratio of unlabeled samples. Specifically, we propose a Deep Rank-consist Ent pyrAmid Model (DREAM), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images. In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4000 images for training purposes. Extensive experiments on four benchmark datasets, namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The codes are available at https://github.com/bridgeqiqi/DREAM.
大多数传统的人群计数方法利用全监督学习框架来建立场景图像与人群密度图之间的映射。它们通常依赖大量昂贵且耗时的像素级注释进行训练监督。减轻密集标注工作并提高计数准确性的一种方法是利用大量未标注图像。这归因于单个图像中固有的自结构信息和秩一致性,在训练期间提供额外的定性关系监督。与早期在原始图像级别利用秩关系的方法不同,我们在潜在特征空间中探索这种秩一致性关系。这种方法能够纳入大量金字塔偏序,增强模型表示能力。一个显著优点是它还可以提高未标注样本的利用率。具体来说,我们提出了一种深度秩一致熵金字塔模型(DREAM),它充分利用潜在空间中从粗到细的金字塔特征之间的秩一致性,通过大量未标注图像增强人群计数。此外,我们收集了一个新的未标注人群计数数据集FUDAN-UCC,包含4000张用于训练的图像。在四个基准数据集,即UCF-QNRF、上海科技大学A部分和B部分以及UCF-CC-50上进行的大量实验表明,与之前的半监督方法相比,我们的方法是有效的。代码可在https://github.com/bridgeqiqi/DREAM获取。