Suppr超能文献

ECCA:基于正交概念分解的高效核熵聚类算法

ECCA: Efficient Correntropy-Based Clustering Algorithm With Orthogonal Concept Factorization.

作者信息

Yang Ben, Zhang Xuetao, Nie Feiping, Chen Badong, Wang Fei, Nan Zhixiong, Zheng Nanning

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7377-7390. doi: 10.1109/TNNLS.2022.3142806. Epub 2023 Oct 5.

Abstract

One of the hottest topics in unsupervised learning is how to efficiently and effectively cluster large amounts of unlabeled data. To address this issue, we propose an orthogonal conceptual factorization (OCF) model to increase clustering effectiveness by restricting the degree of freedom of matrix factorization. In addition, for the OCF model, a fast optimization algorithm containing only a few low-dimensional matrix operations is given to improve clustering efficiency, as opposed to the traditional CF optimization algorithm, which involves dense matrix multiplications. To further improve the clustering efficiency while suppressing the influence of the noises and outliers distributed in real-world data, an efficient correntropy-based clustering algorithm (ECCA) is proposed in this article. Compared with OCF, an anchor graph is constructed and then OCF is performed on the anchor graph instead of directly performing OCF on the original data, which can not only further improve the clustering efficiency but also inherit the advantages of the high performance of spectral clustering. In particular, the introduction of the anchor graph makes ECCA less sensitive to changes in data dimensions and still maintains high efficiency at higher data dimensions. Meanwhile, for various complex noises and outliers in real-world data, correntropy is introduced into ECCA to measure the similarity between the matrix before and after decomposition, which can greatly improve the clustering effectiveness and robustness. Subsequently, a novel and efficient half-quadratic optimization algorithm was proposed to quickly optimize the ECCA model. Finally, extensive experiments on different real-world datasets and noisy datasets show that ECCA can archive promising effectiveness and robustness while achieving tens to thousands of times the efficiency compared with other state-of-the-art baselines.

摘要

无监督学习中最热门的话题之一是如何高效且有效地对大量未标记数据进行聚类。为解决这个问题,我们提出一种正交概念分解(OCF)模型,通过限制矩阵分解的自由度来提高聚类效果。此外,对于OCF模型,给出一种仅包含少量低维矩阵运算的快速优化算法,以提高聚类效率,这与涉及密集矩阵乘法的传统CF优化算法形成对比。为在抑制真实世界数据中分布的噪声和离群值影响的同时进一步提高聚类效率,本文提出一种基于高效核相关熵的聚类算法(ECCA)。与OCF相比,构建一个锚图,然后在锚图上执行OCF,而不是直接在原始数据上执行OCF,这不仅可以进一步提高聚类效率,还能继承谱聚类高性能的优点。特别地,锚图的引入使ECCA对数据维度的变化不太敏感,并且在更高的数据维度下仍保持高效率。同时,针对真实世界数据中的各种复杂噪声和离群值,将核相关熵引入ECCA来衡量分解前后矩阵之间的相似度,这可以大大提高聚类效果和鲁棒性。随后,提出一种新颖且高效的半二次优化算法来快速优化ECCA模型。最后,在不同的真实世界数据集和噪声数据集上进行的大量实验表明,ECCA在实现比其他现有最先进基线高出数十到数千倍效率的同时,能够实现可观的效果和鲁棒性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验