Kang Zhao, Xie Xuanting, Li Bingheng, Pan Erlin
IEEE Trans Neural Netw Learn Syst. 2024 Oct 14;PP. doi: 10.1109/TNNLS.2024.3473618.
In today's digital era driven by data, the amount and complexity of the collected data, such as multiview, non-Euclidean, and multirelational, are growing exponentially or even faster. Clustering, which unsupervisedly extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first use graph filtering (GF) to fuse geometric structure and attribute information. We then reduce complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving (SP) regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111 M.
在当今由数据驱动的数字时代,诸如多视图、非欧几里得和多关系等收集到的数据的数量和复杂性正呈指数级甚至更快地增长。聚类,即从数据中无监督地提取有效知识,在实践中非常有用。然而,现有方法是独立开发的,以牺牲其他挑战为代价来处理一个特定的挑战。在这项工作中,我们提出了一个简单但有效的复杂数据聚类(CDC)框架,该框架可以以线性复杂度有效地处理不同类型的数据。我们首先使用图滤波(GF)来融合几何结构和属性信息。然后,我们通过一种新颖的相似性保持(SP)正则化器自适应学习的高质量锚点来降低复杂性。我们从理论和实验上说明了我们提出的方法的聚类能力。特别是,我们将CDC应用于大小为111M的图数据。