• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。

Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.

机构信息

Bioinformatics and Computational Biology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.

Biostatistics Department, Medical School, Shiraz University of Medical Sciences, Shiraz, Iran.

出版信息

Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.

DOI:10.1155/2020/7636857
PMID:32802153
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7416251/
Abstract

Random selection of initial centroids (centers) for clusters is a fundamental defect in -means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in -means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with -means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary -means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary -means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the -means method. However, a simulation study is required to compare the methods and complete the conclusion.

摘要

随机选择初始质心(中心点)是均值聚类算法的一个基本缺陷,因为该算法的性能取决于初始质心,并且可能最终会陷入局部最优。已经引入了各种混合方法来解决均值聚类算法中的这个缺陷。然而,关于这些方法在各个方面的比较研究还很少,本文使用遗传算法、最小生成树和层次聚类方法的概念,将三种混合方法与均值聚类算法进行了比较。虽然这三种混合方法在之前的研究中受到了更多的关注,但比较它们结果的研究较少。因此,本研究使用了具有不同样本大小、特征数量和不同类别数量的七个定量数据集。还考虑了十一个外部和内部评估指标来比较这些方法。数据表明,与普通的均值方法相比,混合方法在获得最终解决方案时具有更高的收敛速度。此外,与其他两种混合方法相比,基于层次聚类算法的混合方法收敛到最优解所需的迭代次数更少。然而,基于最小生成树和遗传算法的混合方法并不总是或经常比普通的均值方法更有效。因此,尽管计算复杂度增加,但这三种混合方法并没有使均值方法得到很大改进。然而,需要进行模拟研究来比较这些方法并得出完整的结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/b84e7f1968a3/CMMM2020-7636857.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/7b4e41dbd1ea/CMMM2020-7636857.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/e00091179b7b/CMMM2020-7636857.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/6ad68497bba0/CMMM2020-7636857.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/1087f726f2ac/CMMM2020-7636857.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/b84e7f1968a3/CMMM2020-7636857.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/7b4e41dbd1ea/CMMM2020-7636857.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/e00091179b7b/CMMM2020-7636857.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/6ad68497bba0/CMMM2020-7636857.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/1087f726f2ac/CMMM2020-7636857.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b918/7416251/b84e7f1968a3/CMMM2020-7636857.005.jpg

相似文献

1
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
2
Boosting k-means clustering with symbiotic organisms search for automatic clustering problems.利用共生生物搜索算法增强 k-均值聚类算法以解决自动聚类问题。
PLoS One. 2022 Aug 11;17(8):e0272861. doi: 10.1371/journal.pone.0272861. eCollection 2022.
3
An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.一种增强型确定性 K-Means 聚类算法,用于从基因表达数据中预测癌症亚型。
Comput Biol Med. 2017 Dec 1;91:213-221. doi: 10.1016/j.compbiomed.2017.10.014. Epub 2017 Oct 23.
4
Spectral clustering strategies for heterogeneous disease expression data.针对异质性疾病表达数据的谱聚类策略。
Pac Symp Biocomput. 2013:212-23.
5
A Combination of Particle Swarm Optimization and Minkowski Weighted K-Means Clustering: Application in Lateralization of Temporal Lobe Epilepsy.粒子群优化与闵可夫斯基加权 K-均值聚类的组合:在颞叶癫痫侧化中的应用。
Brain Topogr. 2020 Jul;33(4):519-532. doi: 10.1007/s10548-020-00770-9. Epub 2020 Apr 28.
6
Incremental genetic K-means algorithm and its application in gene expression data analysis.增量遗传K均值算法及其在基因表达数据分析中的应用。
BMC Bioinformatics. 2004 Oct 28;5:172. doi: 10.1186/1471-2105-5-172.
7
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters.交叉聚类:一种具有自动估计聚类数量功能的部分聚类算法。
PLoS One. 2016 Mar 25;11(3):e0152333. doi: 10.1371/journal.pone.0152333. eCollection 2016.
8
Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类:采用TSD预聚类的FCV法
Appl Bioinformatics. 2003;2(1):35-45.
9
A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost.一种基于位置划分模型和初始值离群点强化K均值的新型模型,用于降低数据成本。
Entropy (Basel). 2020 Aug 17;22(8):902. doi: 10.3390/e22080902.
10
A hybrid monkey search algorithm for clustering analysis.一种用于聚类分析的混合猴子搜索算法。
ScientificWorldJournal. 2014 Mar 4;2014:938239. doi: 10.1155/2014/938239. eCollection 2014.

本文引用的文献

1
Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest.使用随机森林对癌症基因表达数据进行分类和生物标志物基因选择
Iran J Pathol. 2017 Fall;12(4):339-347. Epub 2017 Oct 1.
2
Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic.使用基于最小生成树的启发式算法B-MST对高通量生物数据进行聚类。
Comput Biol Med. 2015 Jul;62:94-102. doi: 10.1016/j.compbiomed.2015.03.031. Epub 2015 Apr 14.