IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2739-2751. doi: 10.1109/TPAMI.2020.2974830. Epub 2021 Jul 1.
We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every person in the crowd, sizes the spotted heads with bounding box and then counts them. Compared to normal object or face detectors, there exist certain unique challenges in designing such a detection system. Some of them are direct consequences of the huge diversity in dense crowds along with the need to predict boxes contiguously. We solve these issues and develop our LSC-CNN model, which can reliably detect heads of people across sparse to dense crowds. LSC-CNN employs a multi-column architecture with top-down feature modulation to better resolve persons and produce refined predictions at multiple resolutions. Interestingly, the proposed training regime requires only point head annotation, but can estimate approximate size information of heads. We show that LSC-CNN not only has superior localization than existing density regressors, but outperforms in counting as well. The code for our approach is available at https://github.com/val-iisc/lsc-cnn.
我们引入了一个密集人群计数检测框架,消除了流行的密度回归范式的需求。典型的计数模型预测图像的人群密度,而不是检测每个人。这些回归方法通常无法对大多数应用程序进行足够准确的定位,除了计数之外。因此,我们采用了一种架构,可以在人群中定位每个人,用边界框对发现的头部进行大小调整,然后进行计数。与普通的物体或人脸检测器相比,设计这样的检测系统存在一些独特的挑战。其中一些是密集人群中巨大的多样性以及需要连续预测框的直接结果。我们解决了这些问题,并开发了我们的 LSC-CNN 模型,该模型可以可靠地检测到稀疏到密集人群中的人头。LSC-CNN 采用了一种多列架构,具有自顶向下的特征调制,以更好地解决人员问题,并在多个分辨率下生成细化的预测。有趣的是,所提出的训练方案只需要点状头部注释,但可以估计头部的近似大小信息。我们表明,LSC-CNN 不仅在定位方面优于现有的密度回归器,而且在计数方面也表现出色。我们的方法的代码可在 https://github.com/val-iisc/lsc-cnn 上获得。