Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720; Cylink Corporation, Sunnyvale, CA 94087.
IEEE Trans Pattern Anal Mach Intell. 1987 May;9(5):661-8. doi: 10.1109/tpami.1987.4767960.
The fuzzy c-means/ISODATA algorithm is usually described in terms of clustering a finite data set. An equivalent point of view is that the algorithm clusters the support points of a finite-support probability distribution. Motivated by recent work on the hard version of the algorithm, this paper extends the definition to arbitrary distributions and considers asymptotic properties. It is shown that fixed points of the algorithm are stationary points of the fuzzy objective functional, and vice versa. When the algorithm is iteratively applied to an initial prototype set, the sequence of prototype sets produced approaches the set of fixed points. If an unknown distribution is approximated by the empirical distribution of stationary, ergodic observations, then as the number of observations grows large, fixed points of the algorithm based on the empirical distribution approach fixed points of the algorithm based on the true distribution. Furthermore, with respect to minimizing the fuzzy objective functional, the algorithm based on the empirical distribution is asymptotically at least as good as the algorithm based on the true distribution.
模糊 C-均值/ISODATA 算法通常是根据对有限数据集进行聚类来描述的。等价的观点是,该算法对有限支持概率分布的支持点进行聚类。受最近关于硬版本算法的研究的启发,本文将定义扩展到任意分布,并考虑了渐近性质。结果表明,算法的平衡点是模糊目标函数的驻点,反之亦然。当算法被迭代应用于初始原型集时,产生的原型集序列趋近于平衡点集。如果用平稳、遍历观测的经验分布来近似未知分布,那么随着观测数量的增加,基于经验分布的算法的平衡点趋近于基于真实分布的算法的平衡点。此外,在最小化模糊目标函数方面,基于经验分布的算法在渐近意义上至少与基于真实分布的算法一样好。