IEEE Trans Cybern. 2021 Oct;51(10):4822-4833. doi: 10.1109/TCYB.2020.3034316. Epub 2021 Oct 12.
With the development of deep neural networks, the performance of crowd counting and pixel-wise density estimation is continually being refreshed. Despite this, there are still two challenging problems in this field: 1) current supervised learning needs a large amount of training data, but collecting and annotating them is difficult and 2) existing methods cannot generalize well to the unseen domain. A recently released synthetic crowd dataset alleviates these two problems. However, the domain gap between the real-world data and synthetic images decreases the models' performance. To reduce the gap, in this article, we propose a domain-adaptation-style crowd counting method, which can effectively adapt the model from synthetic data to the specific real-world scenes. It consists of multilevel feature-aware adaptation (MFA) and structured density map alignment (SDA). To be specific, MFA boosts the model to extract domain-invariant features from multiple layers. SDA guarantees the network outputs fine density maps with a reasonable distribution on the real domain. Finally, we evaluate the proposed method on four mainstream surveillance crowd datasets, Shanghai Tech Part B, WorldExpo'10, Mall, and UCSD. Extensive experiments are evidence that our approach outperforms the state-of-the-art methods for the same cross-domain counting problem.
随着深度学习神经网络的发展,人群计数和像素密度估计的性能不断刷新。尽管如此,这个领域仍然存在两个具有挑战性的问题:1)当前的监督学习需要大量的训练数据,但收集和注释这些数据非常困难;2)现有的方法不能很好地推广到未见的领域。最近发布的合成人群数据集缓解了这两个问题。然而,真实世界数据和合成图像之间的领域差距降低了模型的性能。为了缩小差距,在本文中,我们提出了一种基于域自适应的人群计数方法,可以有效地将模型从合成数据自适应到特定的真实场景。它由多级特征感知自适应(MFA)和结构化密度图对齐(SDA)组成。具体来说,MFA 促使模型从多个层中提取出具有不变特征的信息。SDA 保证网络在真实领域输出具有合理分布的精细密度图。最后,我们在四个主流的监控人群数据集上评估了所提出的方法,即上海科技大学 Part B、WorldExpo'10、商场和 UCSD。大量的实验证明,我们的方法在相同的跨领域计数问题上优于最先进的方法。