Franca Guilherme, Rizzo Maria L, Vogelstein Joshua T
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4411-4425. doi: 10.1109/TPAMI.2020.2998120. Epub 2021 Nov 3.
Energy statistics was proposed by Székely in the 80's inspired by Newton's gravitational potential in classical mechanics and it provides a model-free hypothesis test for equality of distributions. In its original form, energy statistics was formulated in euclidean spaces. More recently, it was generalized to metric spaces of negative type. In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type. We show that this approach leads to a quadratically constrained quadratic program in the associated kernel space, establishing connections with graph partitioning problems and kernel methods in machine learning. To find local solutions of such an optimization problem, we propose kernel k-groups, which is an extension of Hartigan's method to kernel spaces. Kernel k-groups is cheaper than spectral clustering and has the same computational cost as kernel k-means (which is based on Lloyd's heuristic) but our numerical results show an improved performance, especially in higher dimensions. Moreover, we verify the efficiency of kernel k-groups in community detection in sparse stochastic block models which has fascinating applications in several areas of science.
能量统计量由塞凯利在80年代提出,其灵感来源于经典力学中的牛顿引力势,它为分布相等提供了一种无模型的假设检验。在其原始形式中,能量统计量是在欧几里得空间中制定的。最近,它被推广到负型度量空间。在本文中,我们考虑在负型空间中使用能量统计量的加权版本来解决聚类问题。我们表明,这种方法在相关核空间中导致一个二次约束二次规划,从而建立了与机器学习中的图划分问题和核方法的联系。为了找到此类优化问题的局部解,我们提出了核k组方法,它是哈蒂根方法到核空间的扩展。核k组方法比谱聚类更便宜,并且与核k均值(基于劳埃德启发式算法)具有相同的计算成本,但我们的数值结果显示其性能有所提高,尤其是在高维情况下。此外,我们验证了核k组方法在稀疏随机块模型的社区检测中的效率,该模型在多个科学领域有着引人入胜的应用。