Suppr超能文献

从高维数据中检测有意义的聚类:一种基于强一致性稀疏中心的聚类方法。

Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach.

作者信息

Chakraborty Saptarshi, Das Swagatam

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2894-2908. doi: 10.1109/TPAMI.2020.3047489. Epub 2022 May 5.

Abstract

In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd's heuristic to solve the k-means problem and the like. In this paper, we propose a Lasso Weighted k-means ( LW- k-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ( p) can be much higher than the number of observations ( n). The LW- k-means method imposes an l regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd's method, to optimize the proposed objective. In addition, we establish the strong consistency of the LW- k-means procedure. Such an analysis of the large sample properties is not available for the conventional sparse k-means algorithms, in general. LW- k-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time.

摘要

在高维聚类的背景下,多年来特征加权的概念在揭示数据集的聚类结构时捕捉不同特征的相对重要程度方面变得相当重要。然而,该领域的常用技术要么无法进行特征选择,要么不能保持劳埃德启发式方法解决k均值问题等的简单性。在本文中,我们提出了一种套索加权k均值(LW-k均值)算法,作为一种简单而有效的高维数据稀疏聚类方法,其中特征数量(p)可能远高于观测数量(n)。LW-k均值方法直接施加一个涉及特征权重的l正则化项,以在稀疏聚类框架中进行特征选择。我们开发了一种简单的块坐标下降型算法,其时间复杂度与劳埃德方法相似,以优化所提出的目标。此外,我们建立了LW-k均值方法的强一致性。一般来说,传统的稀疏k均值算法无法进行这种大样本性质的分析。LW-k均值在多个合成数据集和真实数据集上进行了测试,通过详细的实验分析,我们发现该方法的性能不仅在聚类准确性方面,而且在计算时间方面,与基于中心的高维聚类的基线方法以及最新方法相比都具有很强的竞争力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验