使用社区发现技术对分类数据进行聚类。

Clustering Categorical Data Using Community Detection Techniques.

机构信息

Institute of Research and Development, Duy Tan University, P809 7/25 Quang Trung, Danang 550000, Vietnam.

出版信息

Comput Intell Neurosci. 2017;2017:8986360. doi: 10.1155/2017/8986360. Epub 2017 Dec 21.

DOI:10.1155/2017/8986360

PMID:29430249

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5753022/

Abstract

With the advent of the -modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in -modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top- detected communities by size will define the modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for -modes in terms of accuracy, precision, and recall in most of the cases.

摘要

随着 -modes 算法的出现，用于聚类分类数据的工具集拥有了一个在数据项数量上呈线性扩展的高效工具。然而，在 -modes 中随机初始化聚类中心使得如果不进行多次尝试，很难达到良好的聚类效果。最近提出的更好初始化方法是确定性的，并且大大降低了聚类成本。各种初始化方法在启发式方法选择初始中心集的方式上有所不同。在本文中，我们从社区检测的角度来解决分类数据的聚类问题。我们的方案 CD-Clustering 没有初始化 modes 并运行多个迭代，而是构建一个无权重图，并使用快速社区检测技术检测具有高度内聚性的节点群。根据大小检测到的顶级社区将定义 modes。对十个真实的分类数据集的评估表明，在大多数情况下，我们的方法在准确性、精度和召回率方面都优于 -modes 的现有初始化方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06d7/5753022/1122a3211fc4/CIN2017-8986360.001.jpg

相似文献

Clustering Categorical Data Using Community Detection Techniques.使用社区发现技术对分类数据进行聚类。

Comput Intell Neurosci. 2017;2017:8986360. doi: 10.1155/2017/8986360. Epub 2017 Dec 21.

The global kernel k-means algorithm for clustering in feature space.用于特征空间聚类的全局核k均值算法。

IEEE Trans Neural Netw. 2009 Jul;20(7):1181-94. doi: 10.1109/TNN.2009.2019722. Epub 2009 May 29.

An Algorithm for Clustering Categorical Data With Set-Valued Features.一种用于对具有集值特征的分类数据进行聚类的算法。

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4593-4606. doi: 10.1109/TNNLS.2017.2770167. Epub 2017 Nov 29.

An Empirical Analysis of Rough Set Categorical Clustering Techniques.粗糙集分类聚类技术的实证分析

PLoS One. 2017 Jan 9;12(1):e0164803. doi: 10.1371/journal.pone.0164803. eCollection 2017.

Initialization independent clustering with actively self-training method.采用主动自训练方法的初始化无关聚类

IEEE Trans Syst Man Cybern B Cybern. 2012 Feb;42(1):17-27. doi: 10.1109/TSMCB.2011.2161607. Epub 2011 Nov 11.

Evaluation of stability of k-means cluster ensembles with respect to random initialization.关于随机初始化的k均值聚类集成稳定性评估。

IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1798-808. doi: 10.1109/TPAMI.2006.226.

On the impact of dissimilarity measure in k-modes clustering algorithm.关于差异度量在k-模式聚类算法中的影响。

IEEE Trans Pattern Anal Mach Intell. 2007 Mar;29(3):503-7. doi: 10.1109/TPAMI.2007.53.

Review of MR image segmentation techniques using pattern recognition.基于模式识别的磁共振图像分割技术综述。

Med Phys. 1993 Jul-Aug;20(4):1033-48. doi: 10.1118/1.597000.

A hybrid monkey search algorithm for clustering analysis.一种用于聚类分析的混合猴子搜索算法。

ScientificWorldJournal. 2014 Mar 4;2014:938239. doi: 10.1155/2014/938239. eCollection 2014.

Control chart pattern recognition using K-MICA clustering and neural networks.使用 K-MICA 聚类和神经网络进行控制图模式识别。

ISA Trans. 2012 Jan;51(1):111-9. doi: 10.1016/j.isatra.2011.08.005. Epub 2011 Oct 28.

引用本文的文献

A hierarchical cluster analysis for clinical profiling of tofacitinib treatment response in patients with rheumatoid arthritis.分层聚类分析评估托法替布治疗类风湿关节炎患者的临床疗效。

Clin Rheumatol. 2024 Aug;43(8):2489-2501. doi: 10.1007/s10067-024-07035-x. Epub 2024 Jun 26.

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems.增强加权K均值灰狼优化算法：一种用于数据聚类问题的增强型元启发式算法。

Sci Rep. 2024 Mar 5;14(1):5434. doi: 10.1038/s41598-024-55619-z.

Machine Learning-Based Analytical Systems: Food Forensics.基于机器学习的分析系统：食品鉴定学

ACS Omega. 2022 Dec 16;7(51):47518-47535. doi: 10.1021/acsomega.2c05632. eCollection 2022 Dec 27.

本文引用的文献

A Global-Relationship Dissimilarity Measure for the -Modes Clustering Algorithm.用于 - 模式聚类算法的全局关系差异度量

Comput Intell Neurosci. 2017;2017:3691316. doi: 10.1155/2017/3691316. Epub 2017 Mar 28.

Maps of random walks on complex networks reveal community structure.复杂网络上随机游走的图谱揭示了群落结构。

Proc Natl Acad Sci U S A. 2008 Jan 29;105(4):1118-23. doi: 10.1073/pnas.0706851105. Epub 2008 Jan 23.

Near linear time algorithm to detect community structures in large-scale networks.用于检测大规模网络中社区结构的近线性时间算法。

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep;76(3 Pt 2):036106. doi: 10.1103/PhysRevE.76.036106. Epub 2007 Sep 11.

On the impact of dissimilarity measure in k-modes clustering algorithm.关于差异度量在k-模式聚类算法中的影响。

IEEE Trans Pattern Anal Mach Intell. 2007 Mar;29(3):503-7. doi: 10.1109/TPAMI.2007.53.

Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。

Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.

Finding community structure in networks using the eigenvectors of matrices.利用矩阵特征向量在网络中寻找社区结构。

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Sep;74(3 Pt 2):036104. doi: 10.1103/PhysRevE.74.036104. Epub 2006 Sep 11.

Statistical mechanics of community detection.社区检测的统计力学

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Jul;74(1 Pt 2):016110. doi: 10.1103/PhysRevE.74.016110. Epub 2006 Jul 18.

Finding community structure in very large networks.在超大型网络中寻找社区结构。

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Dec;70(6 Pt 2):066111. doi: 10.1103/PhysRevE.70.066111. Epub 2004 Dec 6.

Finding and evaluating community structure in networks.在网络中寻找并评估社区结构。

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Feb;69(2 Pt 2):026113. doi: 10.1103/PhysRevE.69.026113. Epub 2004 Feb 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用社区发现技术对分类数据进行聚类。

Clustering Categorical Data Using Community Detection Techniques.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献