Suppr超能文献

一种基于合并相似聚类的聚类有效性度量模型。

A clustering effectiveness measurement model based on merging similar clusters.

作者信息

Duan Guiqin, Zou Chensong

机构信息

School of Computer and Information Engineering, Guangdong Songshan Vocational and Technical College, Shaoguan, China.

Shaoguan Ecological and Cultural Big Data Engineering & Research Center, Shaoguan, China.

出版信息

PeerJ Comput Sci. 2024 Feb 29;10:e1863. doi: 10.7717/peerj-cs.1863. eCollection 2024.

Abstract

This article presents a clustering effectiveness measurement model based on merging similar clusters to address the problems experienced by the affinity propagation (AP) algorithm in the clustering process, such as excessive local clustering, low accuracy, and invalid clustering evaluation results that occur due to the lack of variety in some internal evaluation indices when the proportion of clusters is very high. First, depending upon the "rough clustering" process of the AP clustering algorithm, similar clusters are merged according to the relationship between the similarity between any two clusters and the average inter-cluster similarity in the entire sample set to decrease the maximum number of clusters . Then, a new scheme is proposed to calculate intra-cluster compactness, inter-cluster relative density, and inter-cluster overlap coefficient. On the basis of this new method, several internal evaluation indices based on intra-cluster cohesion and inter-cluster dispersion are designed. Results of experiments show that the proposed model can perform clustering and classification correctly and provide accurate ranges for clustering using public UCI and NSL-KDD datasets, and it is significantly superior to the three improved clustering algorithms compared with it in terms of intrusion detection indices such as detection rate and false positive rate (FPR).

摘要

本文提出了一种基于合并相似簇的聚类有效性度量模型,以解决亲和传播(AP)算法在聚类过程中遇到的问题,例如局部聚类过多、准确性低以及当簇的比例非常高时由于某些内部评估指标缺乏多样性而导致的无效聚类评估结果。首先,根据AP聚类算法的“粗聚类”过程,依据任意两个簇之间的相似度与整个样本集中簇间平均相似度的关系来合并相似簇,以减少簇的最大数量。然后,提出一种新的方案来计算簇内紧致性、簇间相对密度和簇间重叠系数。基于这种新方法,设计了几个基于簇内凝聚性和簇间离散性的内部评估指标。实验结果表明,所提出的模型能够正确地进行聚类和分类,并使用公共的UCI和NSL-KDD数据集为聚类提供准确的范围,并且在诸如检测率和误报率(FPR)等入侵检测指标方面,明显优于与之比较的三种改进聚类算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6019/10909172/982a4c21df21/peerj-cs-10-1863-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验