• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于具有复杂聚类结构数据集的基于原型聚类的有效性指标。

A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures.

作者信息

Tasdemir K, Merenyi E

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2011 Aug;41(4):1039-53. doi: 10.1109/TSMCB.2010.2104319. Epub 2011 Feb 4.

DOI:10.1109/TSMCB.2010.2104319
PMID:21296711
Abstract

Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results; however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype-based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype-based clusterings, previously proposed indexes-mostly devised for the evaluation of point-based clusterings-usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this paper, we propose a new index, Conn_Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities, or overlaps. We construct Conn_Index based on inter- and intra-cluster connectivities of prototypes. Connectivities are defined through a "connectivity matrix", which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn_Index outperforms existing validity indices, used in this paper, for the evaluation of prototype-based clustering results.

摘要

评估提取的聚类与数据集的真实划分的拟合程度是无监督聚类中的基本挑战之一,因为数据结构和聚类数量在事先是未知的。聚类有效性指标通常用于从不同的聚类结果中选择最佳划分;然而,除非聚类分得很开或具有参数化形状,否则它们往往并不充分。基于原型的聚类(通过对数据进行矢量量化获得的原型进行分组来找到聚类),因其在大型高维数据集分析中的有效性而变得越来越重要,这给这一挑战增添了新的维度。对于基于原型的聚类的有效性评估,先前提出的指标(大多是为基于点的聚类评估而设计的)通常表现不佳。当将有效性指标应用于具有复杂聚类结构的大型数据集时,这种不佳表现会更严重。在本文中,我们提出了一种新的指标Conn_Index,它可以应用于具有各种不同形状、大小、密度或重叠情况的聚类的数据集。我们基于原型的簇间和簇内连通性构建Conn_Index。连通性通过一个“连通性矩阵”来定义,该矩阵是一个加权德劳内图,其中权重表示局部数据分布。对合成数据和真实数据的实验表明,在评估基于原型的聚类结果时,Conn_Index优于本文中使用的现有有效性指标。

相似文献

1
A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures.一种用于具有复杂聚类结构数据集的基于原型聚类的有效性指标。
IEEE Trans Syst Man Cybern B Cybern. 2011 Aug;41(4):1039-53. doi: 10.1109/TSMCB.2010.2104319. Epub 2011 Feb 4.
2
Topology-based hierarchical clustering of self-organizing maps.基于拓扑结构的自组织映射分层聚类
IEEE Trans Neural Netw. 2011 Mar;22(3):474-85. doi: 10.1109/TNN.2011.2107527.
3
A Novel Cluster Validity Index Based on Local Cores.一种基于局部核心的新型聚类有效性指标。
IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):985-999. doi: 10.1109/TNNLS.2018.2853710. Epub 2018 Aug 2.
4
A Link-Based Approach to the Cluster Ensemble Problem.基于链接的聚类集成问题方法。
IEEE Trans Pattern Anal Mach Intell. 2011 Dec;33(12):2396-409. doi: 10.1109/TPAMI.2011.84. Epub 2011 May 12.
5
An interactive approach to multiobjective clustering of gene expression patterns.一种基因表达模式的交互式多目标聚类方法。
IEEE Trans Biomed Eng. 2013 Jan;60(1):35-41. doi: 10.1109/TBME.2012.2220765. Epub 2012 Sep 28.
6
A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering.基于群体智能的聚类有效性指标比较研究
IEEE Trans Syst Man Cybern B Cybern. 2012 Aug;42(4):1243-56. doi: 10.1109/TSMCB.2012.2188509. Epub 2012 Mar 15.
7
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.
8
Self-splitting competitive learning: a new on-line clustering paradigm.自分裂竞争学习:一种新的在线聚类范式。
IEEE Trans Neural Netw. 2002;13(2):369-80. doi: 10.1109/72.991422.
9
LCE: a link-based cluster ensemble method for improved gene expression data analysis.LCE:一种基于链接的聚类集成方法,用于改进基因表达数据分析。
Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.
10
Locally Weighted Ensemble Clustering.局部加权集成聚类。
IEEE Trans Cybern. 2018 May;48(5):1460-1473. doi: 10.1109/TCYB.2017.2702343. Epub 2017 May 23.

引用本文的文献

1
Economic Order Quantity Model-Based Optimized Fuzzy Nonlinear Dynamic Mathematical Schemes.基于经济订货批量模型的优化模糊非线性动态数学方案。
Comput Intell Neurosci. 2022 Jul 15;2022:3881265. doi: 10.1155/2022/3881265. eCollection 2022.
2
Clustering by fuzzy neural gas and evaluation of fuzzy clusters.基于模糊神经网络气模型的聚类及其模糊聚类评价。
Comput Intell Neurosci. 2013;2013:165248. doi: 10.1155/2013/165248. Epub 2013 Dec 16.