Yang Ying, Chen Haoyu, Wu Haoshen
College of Information and Intelligence, Hunan Agricultural University, Changsha, China.
New Energy College, Xi'an Shiyou University, Xi'an, China.
PeerJ Comput Sci. 2023 Oct 5;9:e1600. doi: 10.7717/peerj-cs.1600. eCollection 2023.
Missing data presents a challenge to clustering algorithms, as traditional methods tend to pad incomplete data first before clustering. To combine the two processes of padding and clustering and improve the clustering accuracy, a generalized fuzzy clustering framework is proposed based on optimal completion strategy (OCS) and nearest prototype strategy (NPS) with four improved algorithms developed. Feature weights are introduced to reduce outliers' influence on the cluster centers, and kernel functions are used to solve the linear indistinguishability problem. The proposed algorithms are evaluated regarding correct clustering rate, iteration number, and external evaluation indexes with nine datasets from the UCI (University of California, Irvine) Machine Learning Repository. The results of the experiment indicate that the clustering accuracy of the feature weighted kernel fuzzy C-means algorithm with NPS (NPS-WKFCM) and feature weighted kernel fuzzy C-means algorithm with OCS (OCS-WKFCM) under varying missing rates is superior to that of seven conventional algorithms. Experiments demonstrate that the enhanced algorithm proposed for clustering incomplete data is superior.
缺失数据给聚类算法带来了挑战,因为传统方法往往在聚类之前先对不完整数据进行填充。为了将填充和聚类这两个过程结合起来并提高聚类精度,提出了一种基于最优补全策略(OCS)和最近原型策略(NPS)的广义模糊聚类框架,并开发了四种改进算法。引入特征权重以减少异常值对聚类中心的影响,并使用核函数来解决线性不可区分性问题。使用来自加州大学欧文分校(UCI)机器学习库的九个数据集,从正确聚类率、迭代次数和外部评估指标等方面对所提出的算法进行了评估。实验结果表明,在不同缺失率下,采用NPS的特征加权核模糊C均值算法(NPS-WKFCM)和采用OCS的特征加权核模糊C均值算法(OCS-WKFCM)的聚类精度优于七种传统算法。实验表明,所提出的用于聚类不完整数据的增强算法具有优越性。