Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China.
Sensors (Basel). 2021 Mar 8;21(5):1892. doi: 10.3390/s21051892.
Kernel fuzzy c-means (KFCM) is a significantly improved version of fuzzy c-means (FCM) for processing linearly inseparable datasets. However, for fuzzification parameter m=1, the problem of KFCM (kernel fuzzy c-means) cannot be solved by Lagrangian optimization. To solve this problem, an equivalent model, called kernel probabilistic k-means (KPKM), is proposed here. The novel model relates KFCM to kernel k-means (KKM) in a unified mathematic framework. Moreover, the proposed KPKM can be addressed by the active gradient projection (AGP) method, which is a nonlinear programming technique with constraints of linear equalities and linear inequalities. To accelerate the AGP method, a fast AGP (FAGP) algorithm was designed. The proposed FAGP uses a maximum-step strategy to estimate the step length, and uses an iterative method to update the projection matrix. Experiments demonstrated the effectiveness of the proposed method through a performance comparison of KPKM with KFCM, KKM, FCM and k-means. Experiments showed that the proposed KPKM is able to find nonlinearly separable structures in synthetic datasets. Ten real UCI datasets were used in this study, and KPKM had better clustering performance on at least six datsets. The proposed fast AGP requires less running time than the original AGP, and it reduced running time by 76-95% on real datasets.
核模糊 C 均值(KFCM)是模糊 C 均值(FCM)的一个重要改进版本,用于处理线性不可分离数据集。然而,对于模糊化参数 m=1,KFCM(核模糊 C 均值)的问题无法通过拉格朗日优化来解决。为了解决这个问题,这里提出了一个等价模型,称为核概率 k-均值(KPKM)。该新模型在统一的数学框架中将 KFCM 与核 k-均值(KKM)联系起来。此外,所提出的 KPKM 可以通过主动梯度投影(AGP)方法来解决,这是一种具有线性等式和线性不等式约束的非线性规划技术。为了加速 AGP 方法,设计了一种快速 AGP(FAGP)算法。所提出的 FAGP 使用最大步长策略来估计步长,并使用迭代方法来更新投影矩阵。通过将 KPKM 与 KFCM、KKM、FCM 和 k-均值的性能比较,实验证明了该方法的有效性。实验表明,所提出的 KPKM 能够在合成数据集上找到非线性可分离的结构。本研究使用了十个真实的 UCI 数据集,KPKM 在至少六个数据集上具有更好的聚类性能。所提出的快速 AGP 比原始 AGP 所需的运行时间更少,在真实数据集上的运行时间减少了 76-95%。