Chang Xiangyu, Wang Qingnan, Liu Yuewen, Wang Yu
IEEE Trans Cybern. 2017 Sep;47(9):2616-2627. doi: 10.1109/TCYB.2016.2627686. Epub 2016 Dec 1.
In high-dimensional data clustering practices, the cluster structure is commonly assumed to be confined to a limited number of relevant features, rather than the entire feature set. However, for high-dimensional data, identifying the relevant features and discovering the cluster structure are still challenging problems. To solve these problems, this paper proposes a novel fuzzy c-means (FCM) model with sparse regularization (ℓq(0<q≤1)-norm regularization), by reformulating the FCM objective function into the weighted between-cluster sum of square form and imposing the sparse regularization on the weights. An algorithm is also developed to explicitly solve the proposed model. Compared with the existing clustering models, the proposed model can shrink the weights of irrelevant features (noisy features) to exact zero, and also can be efficiently solved in analytic forms when q = 1,1/2. Experiments on both synthetic and real-world data sets show that the proposed approach outperforms the existing clustering approaches.
在高维数据聚类实践中,通常假设聚类结构局限于有限数量的相关特征,而非整个特征集。然而,对于高维数据而言,识别相关特征并发现聚类结构仍是具有挑战性的问题。为解决这些问题,本文提出一种具有稀疏正则化(ℓq(0<q≤1)-范数正则化)的新型模糊c均值(FCM)模型,即将FCM目标函数重新表述为加权类间平方和形式,并对权重施加稀疏正则化。还开发了一种算法来显式求解所提出的模型。与现有聚类模型相比,所提出的模型能够将不相关特征(噪声特征)的权重缩减至精确为零,并且当q = 1,1/2时还能以解析形式高效求解。在合成数据集和真实世界数据集上的实验表明,所提出的方法优于现有聚类方法。