遗传K均值算法

Genetic K-means algorithm.

作者信息

Krishna K, Narasimha Murty M

机构信息

Dept. of Electr. Eng., Indian Inst. of Sci., Bangalore.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 1999;29(3):433-9. doi: 10.1109/3477.764879.

DOI:10.1109/3477.764879

PMID:18252317

Abstract

In this paper, we propose a novel hybrid genetic algorithm (GA) that finds a globally optimal partition of a given data into a specified number of clusters. GA's used earlier in clustering employ either an expensive crossover operator to generate valid child chromosomes from parent chromosomes or a costly fitness function or both. To circumvent these expensive operations, we hybridize GA with a classical gradient descent algorithm used in clustering, viz. K-means algorithm. Hence, the name genetic K-means algorithm (GKA). We define K-means operator, one-step of K-means algorithm, and use it in GKA as a search operator instead of crossover. We also define a biased mutation operator specific to clustering called distance-based-mutation. Using finite Markov chain theory, we prove that the GKA converges to the global optimum. It is observed in the simulations that GKA converges to the best known optimum corresponding to the given data in concurrence with the convergence result. It is also observed that GKA searches faster than some of the other evolutionary algorithms used for clustering.

摘要

在本文中，我们提出了一种新颖的混合遗传算法（GA），该算法可将给定数据全局最优地划分为指定数量的簇。早期用于聚类的遗传算法要么采用代价高昂的交叉算子从父染色体生成有效的子染色体，要么采用代价高昂的适应度函数，或者两者兼而有之。为了规避这些代价高昂的操作，我们将遗传算法与聚类中使用的经典梯度下降算法（即K均值算法）进行了混合。因此，该算法被称为遗传K均值算法（GKA）。我们定义了K均值算子，即K均值算法的一步，并在GKA中使用它作为搜索算子而非交叉算子。我们还定义了一种特定于聚类的有偏变异算子，称为基于距离的变异。使用有限马尔可夫链理论，我们证明了GKA收敛到全局最优解。在模拟中观察到，GKA与收敛结果一致，收敛到对应给定数据的最知名最优解。还观察到，GKA的搜索速度比用于聚类的其他一些进化算法更快。