Wang Yi, Hou Junhui, Hou Xinyu, Chau Lap-Pui
IEEE Trans Image Process. 2021;30:2876-2887. doi: 10.1109/TIP.2021.3055632. Epub 2021 Feb 12.
In this article, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects directly. Based on a locally-uniform distribution assumption, we initialize pseudo object sizes from the point-level supervisory information, which are then leveraged to guide the regression of object sizes via a crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware refinement scheme to continuously refine the initial pseudo object sizes such that the ability of the detector is increasingly boosted to detect and count objects in crowds simultaneously. Moreover, to address extremely crowded scenes, we propose an effective decoding method to improve the detector's representation ability. Experimental results on the WiderFace benchmark show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks, i.e., our method improves the average precision by more than 10% and reduces the counting error by 31.2%. Besides, our method obtains the best results on the crowd counting and localization datasets (i.e., ShanghaiTech and NWPU-Crowd) and vehicle counting datasets (i.e., CARPK and PUCPR+) compared with state-of-the-art counting-by-detection methods. The code will be publicly available at https://github.com/WangyiNTU/Point-supervised-crowd-detection.
在本文中,我们提出了一种名为Crowd-SDNet的新型自训练方法,该方法能使仅使用点级注释(即对象用点标记)训练的典型目标检测器估计拥挤对象的中心点和大小。具体而言,在训练期间,我们利用可用的点注释直接监督对象中心点的估计。基于局部均匀分布假设,我们从点级监督信息中初始化伪对象大小,然后通过拥挤感知损失利用这些大小来指导对象大小的回归。同时,我们提出了一种置信度和顺序感知细化方案,以不断细化初始伪对象大小,从而逐步提高检测器同时检测和计数人群中对象的能力。此外,为了解决极其拥挤的场景,我们提出了一种有效的解码方法来提高检测器的表征能力。在WiderFace基准上的实验结果表明,我们的方法在检测和计数任务上均显著优于当前的点监督方法,即我们的方法将平均精度提高了10%以上,并将计数误差降低了31.2%。此外,与当前最先进的基于检测的计数方法相比,我们的方法在人群计数与定位数据集(即上海科技大学数据集和西北工业大学人群数据集)以及车辆计数数据集(即CARPK和PUCPR+)上取得了最佳结果。代码将在https://github.com/WangyiNTU/Point-supervised-crowd-detection上公开提供。