Suppr超能文献

用于进化K均值聚类性能的基准有效性指标

Benchmarking validity indices for evolutionary K-means clustering performance.

作者信息

Ikotun Abiodun M, Habyarimana Faustin, Ezugwu Absalom E

机构信息

School of Mathematics, Statistics and Computer Science, University of KwaZulu- Natal, KwaZulu-Natal, Pietermaritzburg Campus, Durban, South Africa.

Unit for Data Science and Computing, North-West University, 11 Hoffman Street, Potchefstroom, 2520, South Africa.

出版信息

Sci Rep. 2025 Jul 1;15(1):21842. doi: 10.1038/s41598-025-08473-6.

Abstract

K-Means is a well-established clustering algorithm widely used in data analysis and various real-world applications. However, its requirement for a predefined number of clusters limits its effectiveness in automatic clustering tasks. To address this, metaheuristic optimisation algorithms have been integrated into K-Means, leading to the development of Evolutionary K-Means clustering approaches. These methods often rely on internal validity indices as fitness functions to automatically determine both the optimal number of clusters and the clustering configuration. However, the effectiveness of internal validity indices is often data-dependent, as most are tailored to specific data characteristics. Consequently, the choice of validity index can significantly influence clustering outcomes. This study evaluates the performance of fifteen internal validity indices within the Enhanced Firefly Algorithm-K-Means (FA-K-Means) framework, an evolutionary approach that integrates Firefly metaheuristics with the classical K-Means algorithm. The performance of each index is assessed across a diverse collection of real-life and synthetic datasets with varying structures. The results reveal that the Calinski-Harabasz (CH) and Silhouette indices consistently outperform others, offering more reliable clustering performance. These findings provide practical guidance for selecting appropriate fitness functions in Evolutionary K-Means algorithms for automatic clustering tasks.

摘要

K均值是一种成熟的聚类算法,广泛应用于数据分析和各种实际应用中。然而,它对预定义聚类数量的要求限制了其在自动聚类任务中的有效性。为了解决这个问题,元启发式优化算法已被集成到K均值中,从而导致了进化K均值聚类方法的发展。这些方法通常依赖内部有效性指标作为适应度函数,以自动确定最佳聚类数量和聚类配置。然而,内部有效性指标的有效性通常依赖于数据,因为大多数指标都是针对特定数据特征量身定制的。因此,有效性指标的选择会显著影响聚类结果。本研究评估了增强型萤火虫算法-K均值(FA-K均值)框架内十五种内部有效性指标的性能,这是一种将萤火虫元启发式算法与经典K均值算法相结合的进化方法。在具有不同结构的各种真实生活和合成数据集上评估每个指标的性能。结果表明,卡林斯基-哈拉巴斯(CH)指标和轮廓指标始终优于其他指标,提供了更可靠的聚类性能。这些发现为在进化K均值算法中选择合适的适应度函数以进行自动聚类任务提供了实际指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e861/12218181/4eb069661408/41598_2025_8473_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验