Lin Hui, Hong Xiaopeng, Ma Zhiheng, Wang Yaowei, Meng Deyu
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9112-9126. doi: 10.1109/TNNLS.2024.3435854. Epub 2025 May 2.
This article addresses the challenge of scale variations in crowd-counting problems from a multidimensional measure-theoretic perspective. We start by formulating crowd counting as a measure-matching problem, based on the assumption that discrete measures can express the scattered ground truth and the predicted density map. In this context, we introduce the Sinkhorn counting loss and extend it to the semi-balanced form, which alleviates the problems including entropic bias, distance destruction, and amount constraints. We then model the measure matching under the multidimensional space, in order to learn the counting from both location and scale. To achieve this, we extend the traditional 2-D coordinate support to 3-D, incorporating an additional axis to represent scale information, where a pyramid-based structure will be leveraged to learn the scale value for the predicted density. Extensive experiments on four challenging crowd-counting datasets, namely, ShanghaiTech A, UCF-QNRF, JHU++, and NWPU have validated the proposed method. Code is released at https://github.com/LoraLinH/Multidimensional-Measure-Matching-for-Crowd-Counting.
本文从多维测度理论的角度探讨了人群计数问题中尺度变化的挑战。我们首先将人群计数表述为一个测度匹配问题,基于离散测度能够表达分散的地面真值和预测密度图的假设。在此背景下,我们引入了Sinkhorn计数损失并将其扩展为半平衡形式,这缓解了包括熵偏差、距离破坏和数量约束等问题。然后,我们在多维空间中对测度匹配进行建模,以便从位置和尺度两方面学习计数。为实现这一点,我们将传统的二维坐标支持扩展到三维,纳入一个额外的轴来表示尺度信息,其中将利用基于金字塔的结构来学习预测密度的尺度值。在四个具有挑战性的人群计数数据集,即上海科技大学A、UCF-QNRF、JHU++和西北工业大学数据集上进行的大量实验验证了所提出的方法。代码已在https://github.com/LoraLinH/Multidimensional-Measure-Matching-for-Crowd-Counting上发布。