Suppr超能文献

初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。

Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.

机构信息

Bioinformatics and Computational Biology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.

Biostatistics Department, Medical School, Shiraz University of Medical Sciences, Shiraz, Iran.

出版信息

Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.

Abstract

Random selection of initial centroids (centers) for clusters is a fundamental defect in -means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in -means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with -means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary -means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary -means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the -means method. However, a simulation study is required to compare the methods and complete the conclusion.

摘要

随机选择初始质心(中心点)是均值聚类算法的一个基本缺陷,因为该算法的性能取决于初始质心,并且可能最终会陷入局部最优。已经引入了各种混合方法来解决均值聚类算法中的这个缺陷。然而,关于这些方法在各个方面的比较研究还很少,本文使用遗传算法、最小生成树和层次聚类方法的概念,将三种混合方法与均值聚类算法进行了比较。虽然这三种混合方法在之前的研究中受到了更多的关注,但比较它们结果的研究较少。因此,本研究使用了具有不同样本大小、特征数量和不同类别数量的七个定量数据集。还考虑了十一个外部和内部评估指标来比较这些方法。数据表明,与普通的均值方法相比,混合方法在获得最终解决方案时具有更高的收敛速度。此外,与其他两种混合方法相比,基于层次聚类算法的混合方法收敛到最优解所需的迭代次数更少。然而,基于最小生成树和遗传算法的混合方法并不总是或经常比普通的均值方法更有效。因此,尽管计算复杂度增加,但这三种混合方法并没有使均值方法得到很大改进。然而,需要进行模拟研究来比较这些方法并得出完整的结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/7b4e41dbd1ea/CMMM2020-7636857.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验