Li Tao, Meng Cheng, Xu Hongteng, Yu Jun
IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4993-5007. doi: 10.1109/TPAMI.2024.3363780. Epub 2024 Jun 5.
Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the L cost in the d-dimensional space converges to its population counterpart at a rate of no more than O(n). To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.
分布比较在诸如数据分类和生成建模等许多机器学习任务中起着核心作用。在本研究中,我们提出了一种名为希尔伯特曲线投影(HCP)距离的新度量,用于测量两个低复杂度概率分布之间的距离。具体而言,我们首先使用希尔伯特曲线对两个高维概率分布进行投影,以获得它们之间的耦合,然后根据该耦合计算这两个分布在原始空间中的传输距离。我们表明,HCP距离是一种合适的度量,对于具有有界支撑的概率测度是明确定义的。此外,我们证明了在d维空间中具有L成本的修正经验HCP距离以不超过O(n)的速率收敛到其总体对应物。为了抑制维数灾难,我们还使用(可学习的)子空间投影开发了HCP距离的两种变体。在合成数据和真实世界数据上的实验表明,我们的HCP距离作为具有低复杂度的瓦瑟斯坦距离的有效替代物,克服了切片瓦瑟斯坦距离的缺点。