Pardamean Bens, Abid Faizal, Cenggoro Tjeng Wawan, Elwirehardja Gregorius Natanael, Muljo Hery Harjono
Computer Science Department, BINUS Graduate Program - Master of Computer Science Program, Bina Nusantara University, Jakarta, Indonesia.
Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia.
PeerJ Comput Sci. 2022 Sep 22;8:e1067. doi: 10.7717/peerj-cs.1067. eCollection 2022.
In recent years, the performance of people-counting models has been dramatically increased that they can be implemented in practical cases. However, the current models can only count all of the people captured in the inputted closed circuit television (CCTV) footage. Oftentimes, we only want to count people in a specific Region-of-Interest (RoI) in the footage. Unfortunately, simple approaches such as covering the area outside of the RoI are not applicable without degrading the performance of the models. Therefore, we developed a novel learning strategy that enables a deep-learning-based people counting model to count people only in a certain RoI. In the proposed method, the people counting model has two heads that are attached on top of a crowd counting backbone network. These two heads respectively learn to count people inside the RoI and negate the people count outside the RoI. We named this proposed method Gap Regularizer and tested it on ResNet-50, ResNet-101, CSRNet, and SFCN. The experiment results showed that Gap Regularizer can reduce the mean absolute error (MAE), root mean square error (RMSE), and grid average mean error (GAME) of ResNet-50, which is the smallest CNN model, with the highest reduction of 45.2%, 41.25%, and 46.43%, respectively. On shallow models such as the CSRNet, the regularizer can also drastically increase the SSIM by up to 248.65% in addition to reducing the MAE, RMSE, and GAME. The Gap Regularizer can also improve the performance of SFCN which is a deep CNN model with back-end features by up to 17.22% and 10.54% compared to its standard version. Moreover, the impacts of the Gap Regularizer on these two models are also generally statistically significant (-value < 0.05) on the MOT17-09, MOT20-02, and RHC datasets. However, it has a limitation in which it is unable to make significant impacts on deep models without back-end features such as the ResNet-101.
近年来,人数统计模型的性能有了显著提升,使其能够应用于实际案例中。然而,当前的模型只能对输入的闭路电视(CCTV)画面中捕捉到的所有人进行计数。通常情况下,我们只想对画面中特定的感兴趣区域(RoI)内的人员进行计数。不幸的是,诸如覆盖RoI之外区域的简单方法在不降低模型性能的情况下并不适用。因此,我们开发了一种新颖的学习策略,使基于深度学习的人数统计模型能够仅对特定RoI内的人员进行计数。在所提出的方法中,人数统计模型有两个头部,连接在人群计数骨干网络之上。这两个头部分别学习对RoI内的人员进行计数,并对RoI外的人员计数进行否定。我们将此方法命名为间隙正则化器,并在ResNet-50、ResNet-101、CSRNet和SFCN上进行了测试。实验结果表明,间隙正则化器可以降低ResNet-50(最小的卷积神经网络模型)的平均绝对误差(MAE)、均方根误差(RMSE)和网格平均平均误差(GAME),降幅分别高达45.2%、41.25%和46.43%。在诸如CSRNet这样的浅层模型上,除了降低MAE、RMSE和GAME之外,该正则化器还能将结构相似性指数(SSIM)大幅提高高达248.65%。与标准版本相比,间隙正则化器还能将具有后端特征的深度卷积神经网络模型SFCN的性能提高高达17.22%和10.54%。此外,间隙正则化器对这两个模型的影响在MOT17-09、MOT20-02和RHC数据集上通常也具有统计学意义(p值<0.05)。然而,它有一个局限性,即对没有后端特征的深度模型(如ResNet-101)无法产生显著影响。