Suppr超能文献

使用社区发现技术对分类数据进行聚类。

Clustering Categorical Data Using Community Detection Techniques.

机构信息

Institute of Research and Development, Duy Tan University, P809 7/25 Quang Trung, Danang 550000, Vietnam.

出版信息

Comput Intell Neurosci. 2017;2017:8986360. doi: 10.1155/2017/8986360. Epub 2017 Dec 21.

Abstract

With the advent of the -modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in -modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top- detected communities by size will define the modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for -modes in terms of accuracy, precision, and recall in most of the cases.

摘要

随着 -modes 算法的出现,用于聚类分类数据的工具集拥有了一个在数据项数量上呈线性扩展的高效工具。然而,在 -modes 中随机初始化聚类中心使得如果不进行多次尝试,很难达到良好的聚类效果。最近提出的更好初始化方法是确定性的,并且大大降低了聚类成本。各种初始化方法在启发式方法选择初始中心集的方式上有所不同。在本文中,我们从社区检测的角度来解决分类数据的聚类问题。我们的方案 CD-Clustering 没有初始化 modes 并运行多个迭代,而是构建一个无权重图,并使用快速社区检测技术检测具有高度内聚性的节点群。根据大小检测到的顶级社区将定义 modes。对十个真实的分类数据集的评估表明,在大多数情况下,我们的方法在准确性、精度和召回率方面都优于 -modes 的现有初始化方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06d7/5753022/1122a3211fc4/CIN2017-8986360.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验