Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, Guangdong, China; Shenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, Guangdong, China.
School of Automation, Guangdong University of Technology, Guangzhou 510006, Guangdong, China.
Neural Netw. 2018 Dec;108:83-96. doi: 10.1016/j.neunet.2018.08.007. Epub 2018 Aug 14.
Low-rank representation (LRR) has aroused much attention in the community of data mining. However, it has the following twoproblems which greatly limit its applications: (1) it cannot discover the intrinsic structure of data owing to the neglect of the local structure of data; (2) the obtained graph is not the optimal graph for clustering. To solve the above problems and improve the clustering performance, we propose a novel graph learning method named low-rank representation with adaptive graph regularization (LRR_AGR) in this paper. Firstly, a distance regularization term and a non-negative constraint are jointly integrated into the framework of LRR, which enables the method to simultaneously exploit the global and local information of data for graph learning. Secondly, a novel rank constraint is further introduced to the model, which encourages the learned graph to have very clear clustering structures, i.e., exactly c connected components for the data with c clusters. These two approaches are meaningful and beneficial to learn the optimal graph that discovers the intrinsic structure of data. Finally, an efficient iterative algorithm is provided to optimize the model. Experimental results on synthetic and real datasets show that the proposed method can significantly improve the clustering performance.
低秩表示 (LRR) 在数据挖掘领域引起了广泛关注。然而,它存在以下两个问题,极大地限制了其应用:(1) 由于忽略了数据的局部结构,它无法发现数据的内在结构;(2) 得到的图不是用于聚类的最优图。为了解决上述问题并提高聚类性能,我们在本文中提出了一种名为带自适应图正则化的低秩表示 (LRR_AGR) 的新的图学习方法。首先,我们将一个距离正则化项和一个非负约束联合集成到 LRR 的框架中,使该方法能够同时利用数据的全局和局部信息进行图学习。其次,我们进一步向模型中引入了一个新的秩约束,鼓励学习到的图具有非常清晰的聚类结构,即对于具有 c 个聚类的数据,恰好有 c 个连通分量。这两种方法对于学习发现数据内在结构的最优图是有意义和有益的。最后,我们提供了一个有效的迭代算法来优化模型。在合成和真实数据集上的实验结果表明,所提出的方法可以显著提高聚类性能。