Computer Science Department, Purdue University, West Lafayette, Indiana, United States of America.
PLoS One. 2020 Dec 23;15(12):e0243485. doi: 10.1371/journal.pone.0243485. eCollection 2020.
Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.
局部图聚类是一项重要的机器学习任务,旨在在一组种子节点附近找到一个连接良好的簇。最近的研究结果表明,结合更高阶信息可以显著提高图聚类技术的效果。该领域的大多数现有研究都集中在基于谱图理论的技术上。然而,另一种从目标上使用最大流和最小割的局部图聚类的观点,提供了截然不同的保证。例如,最近提出了一种称为容量释放扩散(CRD)的新方法,它比谱方法更好地保留了种子周围的局部结构。该方法也是第一个通过假设种子附近有一个良好的簇,而不受二次 Cheeger 不等式限制的局部聚类技术。在本文中,我们通过将 CRD 过程扩展到基于高阶模式的聚类来提出一种称为超图 CRD(HG-CRD)的局部超图聚类技术,该方法编码为超图的超边。此外,我们从理论上证明了 HG-CRD 给出了一个称为模体传导率的量的结果,而不是之前实验中使用的有偏差的版本。在合成数据集和真实世界图上的实验结果表明,HG-CRD 提高了聚类质量。