Parker Jonathon K, Hall Lawrence O
Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA.
IEEE Trans Fuzzy Syst. 2014 Oct;22(5):1229-1244. doi: 10.1109/TFUZZ.2013.2286993. Epub 2013 Oct 23.
Many algorithms designed to accelerate the Fuzzy c-Means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, GOFCM and MSERFCM, that use a statistical method to estimate the subsample size. GOFCM, a variant of SPFCM, also leverages progressive sampling. MSERFCM, a variant of rseFCM, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared to FCM and four accelerated variants of FCM. GOFCM's speedup was 4-47 times that of FCM and faster than SPFCM on each of the six datasets used in experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was 5-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM.
许多旨在加速模糊 c 均值(FCM)聚类算法的算法会对数据进行随机采样。通常,尽管子样本大小会对速度和质量产生影响,但并未使用统计方法来估计子样本大小。本文介绍了两种新的加速算法,即 GOFCM 和 MSERFCM,它们使用统计方法来估计子样本大小。GOFCM 是 SPFCM 的一种变体,还利用了渐进采样。MSERFCM 是 rseFCM 的一种变体,通过改进初始化实现了加速。引入了一种通用的、新颖的加速聚类停止准则。将新算法与 FCM 以及 FCM 的四种加速变体进行了比较。在实验使用的六个数据集中的每一个上,GOFCM 的加速比 FCM 快 4 至 47 倍,并且比 SPFCM 更快。对于其中五个数据集,划分结果与 FCM 的划分结果相差在 1%以内。MSERFCM 的加速比 FCM 快 5 至 26 倍,并且在所有数据集上产生的划分结果与 FCM 的划分结果相差在 3%以内。一个由浮游生物图像组成的独特数据集揭示了许多测试算法的优缺点。结果表明,新的停止准则在加速诸如 SPFCM 等算法方面是有效的,并且最终划分结果与 FCM 的划分结果非常接近。