Nie Hebing, Li Qi, Wang Zheng, Zhao Haifeng, Nie Feiping
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18105-18119. doi: 10.1109/TNNLS.2023.3311789. Epub 2024 Dec 2.
Graph-based semisupervised learning can explore the graph topology information behind the samples, becoming one of the most attractive research areas in machine learning in recent years. Nevertheless, existing graph-based methods also suffer from two shortcomings. On the one hand, the existing methods generate graphs in the original high-dimensional space, which are easily disturbed by noisy and redundancy features, resulting in low-quality constructed graphs that cannot accurately portray the relationships between data. On the other hand, most of the existing models are based on the Gaussian assumption, which cannot capture the local submanifold structure information of the data, thus reducing the discriminativeness of the learned low-dimensional representations. This article proposes a semisupervised subspace learning with adaptive pairwise graph embedding (APGE), which first builds a -nearest neighbor graph on the labeled data to learn local discriminant embeddings for exploring the intrinsic structure of the non-Gaussian labeled data, i.e., the submanifold structure. Then, a -nearest neighbor graph is constructed on all samples and mapped to GE learning to adaptively explore the global structure of all samples. Clustering unlabeled data and its corresponding labeled neighbors into the same submanifold, sharing the same label information, improves embedded data's discriminative ability. And the adaptive neighborhood learning method is used to learn the graph structure in the continuously optimized subspace to ensure that the optimal graph matrix and projection matrix are finally learned, which has strong robustness. Meanwhile, the rank constraint is added to the Laplacian matrix of the similarity matrix of all samples so that the connected components in the obtained similarity matrix are precisely equal to the number of classes in the sample, which makes the structure of the graph clearer and the relationship between the near-neighbor sample points more explicit. Finally, multiple experiments on several synthetic and real-world datasets show that the method performs well in exploring local structure and classification tasks.
基于图的半监督学习可以探索样本背后的图拓扑信息,成为近年来机器学习中最具吸引力的研究领域之一。然而,现有的基于图的方法也存在两个缺点。一方面,现有方法在原始高维空间中生成图,容易受到噪声和冗余特征的干扰,导致构建的图质量较低,无法准确描绘数据之间的关系。另一方面,现有的大多数模型基于高斯假设,无法捕捉数据的局部子流形结构信息,从而降低了学习到的低维表示的判别能力。本文提出了一种具有自适应成对图嵌入(APGE)的半监督子空间学习方法,该方法首先在标记数据上构建k近邻图,以学习局部判别嵌入,用于探索非高斯标记数据的内在结构,即子流形结构。然后,在所有样本上构建k近邻图并映射到GE学习,以自适应地探索所有样本的全局结构。将未标记数据及其相应的标记邻居聚类到同一子流形中,共享相同的标签信息,提高了嵌入数据的判别能力。并且使用自适应邻域学习方法在不断优化的子空间中学习图结构,以确保最终学习到最优的图矩阵和投影矩阵,具有很强的鲁棒性。同时,在所有样本的相似性矩阵的拉普拉斯矩阵上添加秩约束,使得得到的相似性矩阵中的连通分量恰好等于样本中的类别数,这使得图的结构更清晰,近邻样本点之间的关系更明确。最后,在几个合成数据集和真实世界数据集上进行的多项实验表明,该方法在探索局部结构和分类任务中表现良好。