Bioinformatics and Computational Biology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
Biostatistics Department, Medical School, Shiraz University of Medical Sciences, Shiraz, Iran.
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
Random selection of initial centroids (centers) for clusters is a fundamental defect in -means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in -means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with -means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary -means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary -means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the -means method. However, a simulation study is required to compare the methods and complete the conclusion.
随机选择初始质心(中心点)是均值聚类算法的一个基本缺陷,因为该算法的性能取决于初始质心,并且可能最终会陷入局部最优。已经引入了各种混合方法来解决均值聚类算法中的这个缺陷。然而,关于这些方法在各个方面的比较研究还很少,本文使用遗传算法、最小生成树和层次聚类方法的概念,将三种混合方法与均值聚类算法进行了比较。虽然这三种混合方法在之前的研究中受到了更多的关注,但比较它们结果的研究较少。因此,本研究使用了具有不同样本大小、特征数量和不同类别数量的七个定量数据集。还考虑了十一个外部和内部评估指标来比较这些方法。数据表明,与普通的均值方法相比,混合方法在获得最终解决方案时具有更高的收敛速度。此外,与其他两种混合方法相比,基于层次聚类算法的混合方法收敛到最优解所需的迭代次数更少。然而,基于最小生成树和遗传算法的混合方法并不总是或经常比普通的均值方法更有效。因此,尽管计算复杂度增加,但这三种混合方法并没有使均值方法得到很大改进。然而,需要进行模拟研究来比较这些方法并得出完整的结论。