Duan Guiqin, Zou Chensong
School of Computer and Information Engineering, Guangdong Songshan Vocational and Technical College, Shaoguan, China.
Shaoguan Ecological and Cultural Big Data Engineering & Research Center, Shaoguan, China.
PeerJ Comput Sci. 2024 Feb 29;10:e1863. doi: 10.7717/peerj-cs.1863. eCollection 2024.
This article presents a clustering effectiveness measurement model based on merging similar clusters to address the problems experienced by the affinity propagation (AP) algorithm in the clustering process, such as excessive local clustering, low accuracy, and invalid clustering evaluation results that occur due to the lack of variety in some internal evaluation indices when the proportion of clusters is very high. First, depending upon the "rough clustering" process of the AP clustering algorithm, similar clusters are merged according to the relationship between the similarity between any two clusters and the average inter-cluster similarity in the entire sample set to decrease the maximum number of clusters . Then, a new scheme is proposed to calculate intra-cluster compactness, inter-cluster relative density, and inter-cluster overlap coefficient. On the basis of this new method, several internal evaluation indices based on intra-cluster cohesion and inter-cluster dispersion are designed. Results of experiments show that the proposed model can perform clustering and classification correctly and provide accurate ranges for clustering using public UCI and NSL-KDD datasets, and it is significantly superior to the three improved clustering algorithms compared with it in terms of intrusion detection indices such as detection rate and false positive rate (FPR).
本文提出了一种基于合并相似簇的聚类有效性度量模型,以解决亲和传播(AP)算法在聚类过程中遇到的问题,例如局部聚类过多、准确性低以及当簇的比例非常高时由于某些内部评估指标缺乏多样性而导致的无效聚类评估结果。首先,根据AP聚类算法的“粗聚类”过程,依据任意两个簇之间的相似度与整个样本集中簇间平均相似度的关系来合并相似簇,以减少簇的最大数量。然后,提出一种新的方案来计算簇内紧致性、簇间相对密度和簇间重叠系数。基于这种新方法,设计了几个基于簇内凝聚性和簇间离散性的内部评估指标。实验结果表明,所提出的模型能够正确地进行聚类和分类,并使用公共的UCI和NSL-KDD数据集为聚类提供准确的范围,并且在诸如检测率和误报率(FPR)等入侵检测指标方面,明显优于与之比较的三种改进聚类算法。