Cao Rundong, Yu Jiazhong, Liu Ziwei, Liang Qinghua
China Tower Corporation Limited, Beijing, China.
PLoS One. 2025 Jul 2;20(7):e0327397. doi: 10.1371/journal.pone.0327397. eCollection 2025.
In open environments, complex and variable backgrounds and dense multi-scale targets are two key challenges for crowd counting. Due to the reliance on supervised learning with labeled data, current methods struggle to adapt to crowd detection in complex scenarios when training data is limited; Moreover, detection-based methods may lead to numerous missed detections when dealing with dense, small-scale target groups. This paper proposes a simple yet effective point-based contrastive learning method to alleviate these issues. Initially, we construct contrastive cropped samples and feed them into a convolutional neural network to predict head points of each image patch. Based on the classification and regression loss of these points, we incorporate an auxiliary supervision contrastive learning loss to enhance the model's ability to differentiate between foreground heads and the background. Additionally, a multi-scale feature fusion module is proposed to obtain high-quality feature maps for detecting targets of different scales. Comparative experimental results on public crowd counting datasets demonstrate that the proposed method achieves state-of-the-art performance.
在开放环境中,复杂多变的背景和密集的多尺度目标是人群计数的两个关键挑战。由于依赖带有标注数据的监督学习,当训练数据有限时,当前方法难以适应复杂场景下的人群检测;此外,基于检测的方法在处理密集、小尺度目标群体时可能会导致大量漏检。本文提出一种简单而有效的基于点的对比学习方法来缓解这些问题。首先,我们构建对比裁剪样本并将其输入卷积神经网络以预测每个图像块的头部点。基于这些点的分类和回归损失,我们引入辅助监督对比学习损失以增强模型区分前景头部和背景的能力。此外,还提出了一个多尺度特征融合模块以获取用于检测不同尺度目标的高质量特征图。在公共人群计数数据集上的对比实验结果表明,所提出的方法取得了领先的性能。