Ananthi M, Valarmathi K, Ramathilagam A, Praveen R
Department of Computer Science and Business Systems, Sri Sairam Engineering College, Chennai, Tamilnadu, 600064, India.
Department of Electronics and Communication Engineering, P.S.R. Engineering College, Sivakasi, Tamilnadu, 626140, India.
Sci Rep. 2025 Jul 1;15(1):22343. doi: 10.1038/s41598-025-07404-9.
In dynamic data stream environment, the problem related to the exploration of big data within the real time scenario cannot be addressed through the tracking of each individual historic data even though it is highly memory expensive. Thus, a data stream clustering method is essential for exploring and storing the potential amount of information from the historical data determined in a single pass. The dynamic algorithms developed for clustering need to satisfy the two requirements of concept drift and concept evolution. These dynamic algorithms need to handle the change in the association between the object attributes that are existing within each individual clusters. In this paper, A Hybrid Lion and Exponential PSO-based Metaheuristic Clustering Approach (HLEPSOMCA) is proposed for satisfying the requirements of concept drift and concept evolution during efficient dynamic data stream management. This Metaheuristic Clustering Approach is proposed with the properties of good scalability and minimized number of parameters with respect to the number of clusters and high dimensional data determined from the dataset. It adopted different methods of stochastic optimization and deterministic clustering techniques for centring the clusters in an optimal manner. It further adopted density clustering strategies for determining micro clusters, such that Lion and Exponential PSO can be adopted in the initialization phase for maximizing the performance. The experimental results of this HLEPSOMCA approach with respect to KDD-99 dataset confirmed that the purity achieved by the proposed HLEPSOMCA scheme is improved on an average by 13.24%, better than the bassline approaches used for comparison.
在动态数据流环境中,即使跟踪每个历史数据的成本很高,在实时场景下探索大数据相关的问题也无法通过这种方式解决。因此,一种数据流聚类方法对于从单次遍历中确定的历史数据中探索和存储潜在信息量至关重要。为聚类开发的动态算法需要满足概念漂移和概念演化这两个要求。这些动态算法需要处理每个聚类中现有对象属性之间关联的变化。本文提出了一种基于混合狮子算法和指数粒子群优化的元启发式聚类方法(HLEPSOMCA),以满足高效动态数据流管理中概念漂移和概念演化的要求。该元启发式聚类方法具有良好的可扩展性,相对于从数据集中确定的聚类数量和高维数据,参数数量最少。它采用了不同的随机优化方法和确定性聚类技术,以最优方式确定聚类中心。它还采用密度聚类策略来确定微聚类,以便在初始化阶段采用狮子算法和指数粒子群优化来最大化性能。针对KDD - 99数据集的HLEPSOMCA方法的实验结果证实,所提出的HLEPSOMCA方案实现的纯度平均提高了13.24%,优于用于比较的基线方法。