Liu Zelong, Zhou Xin, Zhou Tao, Chen Yuanyuan
College of Computer Science, Sichuan University, Chengdu 610000, China.
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China.
Sensors (Basel). 2023 Sep 29;23(19):8177. doi: 10.3390/s23198177.
Estimating object counts within a single image or video frame represents a challenging yet pivotal task in the field of computer vision. Its increasing significance arises from its versatile applications across various domains, including public safety and urban planning. Among the various object counting tasks, crowd counting is particularly notable for its critical role in social security and urban planning. However, intricate backgrounds in images often lead to misidentifications, wherein the complex background is mistaken as the foreground, thereby inflating forecasting errors. Additionally, the uneven distribution of crowd density within the foreground further exacerbates predictive errors of the network. This paper introduces a novel architecture with a three-branch structure aimed at synergistically incorporating hierarchical foreground information and global scale information into density map estimation, thereby achieving more precise counting results. Hierarchical foreground information guides the network to perform distinct operations on regions with varying densities, while global scale information evaluates the overall density level of the image and adjusts the model's global predictions accordingly. We also systematically investigate and compare three potential locations for integrating hierarchical foreground information into the density estimation network, ultimately determining the most effective placement.Through extensive comparative experiments across three datasets, we demonstrate the superior performance of our proposed method.
估计单个图像或视频帧中的物体数量是计算机视觉领域一项具有挑战性但又至关重要的任务。其重要性日益凸显,源于其在包括公共安全和城市规划在内的各个领域的广泛应用。在各种物体计数任务中,人群计数因其在社会保障和城市规划中的关键作用而尤为显著。然而,图像中复杂的背景常常导致误识别,即复杂背景被误认作前景,从而夸大预测误差。此外,前景内人群密度的不均匀分布进一步加剧了网络的预测误差。本文介绍了一种具有三分支结构的新颖架构,旨在将分层前景信息和全局尺度信息协同整合到密度图估计中,从而获得更精确的计数结果。分层前景信息引导网络对密度不同的区域执行不同操作,而全局尺度信息评估图像的整体密度水平,并相应调整模型的全局预测。我们还系统地研究和比较了将分层前景信息整合到密度估计网络中的三个潜在位置,最终确定了最有效的放置位置。通过在三个数据集上进行广泛的对比实验,我们证明了所提方法的卓越性能。