Suppr超能文献

一种用于确定最优聚类数的自适应模糊均值算法

A Self-Adaptive Fuzzy -Means Algorithm for Determining the Optimal Number of Clusters.

作者信息

Ren Min, Liu Peiyu, Wang Zhihao, Yi Jing

机构信息

School of Information Science and Engineering, Shandong Normal University, Jinan, Shandong, China; School of Mathematic and Quantitative Economics, Shandong University of Finance and Economics, Jinan, Shandong, China; Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong, China.

School of Information Science and Engineering, Shandong Normal University, Jinan, Shandong, China; Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong, China.

出版信息

Comput Intell Neurosci. 2016;2016:2647389. doi: 10.1155/2016/2647389. Epub 2016 Nov 29.

Abstract

For the shortcoming of fuzzy -means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule [Formula: see text] and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result.

摘要

针对模糊均值算法(FCM)需要预先知道聚类数量的缺点,本文提出了一种新的自适应方法来确定最优聚类数。首先,提出了一种基于密度的算法。该算法根据数据集的特征,自动确定可能的最大聚类数,而不是使用经验规则[公式:见原文],并获得最优的初始聚类中心,改善了FCM随机选择聚类中心导致收敛结果陷入局部最小值的局限性。其次,本文通过引入惩罚函数,提出了一种基于模糊紧致性和分离度的新的模糊聚类有效性指标,确保当聚类数接近数据集中对象的数量时,聚类有效性指标的值不会单调下降并接近零,从而使最优聚类数失去鲁棒性和决策功能。然后,基于这些研究,提出了一种自适应FCM算法,通过迭代试错过程来估计最优聚类数。最后,在UCI、1999年KDD杯和合成数据集上进行了实验,结果表明该方法不仅能有效地确定最优聚类数,还能减少FCM的迭代次数,且聚类结果稳定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c0c/5153549/94d4a685bcef/CIN2016-2647389.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验