Suppr超能文献

使用扩展K均值方法在特征丰富网络上进行社区划分

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method.

作者信息

Shalileh Soroosh, Mirkin Boris

机构信息

Center for Language and Brain, HSE University, Myasnitskaya Ulitsa 20, 101000 Moscow, Russia.

Department of Data Analysis and Artificial Intelligence, HSE University, Pokrovsky Boulevard, 11, 101000 Moscow, Russia.

出版信息

Entropy (Basel). 2022 Apr 29;24(5):626. doi: 10.3390/e24050626.

Abstract

This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets.

摘要

由于我们对非可加性模式的假设,本文提出了一种对著名的K均值算法有意义且有效的扩展,用于在特征丰富的网络中检测社区。我们通过最小二乘法逼近给定的节点间链接矩阵和特征值矩阵,从而将传统的K均值聚类方法直接扩展为一种针对该准则的交替最小化策略。这在一个双重空间中起作用,同时包含网络节点和特征。所使用的度量是特征空间和网络空间中平方欧几里得距离的加权和。为了解决所谓的维度诅咒问题,我们将其扩展为一个使用实体与中心之间余弦距离的版本。我们方法的另一个版本基于曼哈顿距离度量。我们进行计算实验来测试我们的方法,并将其性能与在合成数据集和真实世界数据集上竞争的流行算法的性能进行比较。扩展K均值的基于余弦的版本通常在高维真实世界数据集上获胜。相比之下,基于曼哈顿的版本在大多数合成数据集上获胜。

相似文献

7
Improving the Walktrap Algorithm Using -Means Clustering.使用 -Means 聚类改进 Walktrap 算法。
Multivariate Behav Res. 2024 Mar-Apr;59(2):266-288. doi: 10.1080/00273171.2023.2254767. Epub 2024 Feb 15.
8
Design of double fuzzy clustering-driven context neural networks.双模糊聚类驱动的上下文神经网络设计。
Neural Netw. 2018 Aug;104:1-14. doi: 10.1016/j.neunet.2018.03.018. Epub 2018 Apr 9.

本文引用的文献

3
The ground truth about metadata and community detection in networks.网络中关于元数据和社区检测的真相。
Sci Adv. 2017 May 3;3(5):e1602548. doi: 10.1126/sciadv.1602548. eCollection 2017 May.
4
SNAP: A General Purpose Network Analysis and Graph Mining Library.SNAP:一个通用的网络分析和图挖掘库。
ACM Trans Intell Syst Technol. 2016 Oct;8(1). doi: 10.1145/2898361. Epub 2016 Oct 3.
5
Structure and inference in annotated networks.带注释网络中的结构和推理。
Nat Commun. 2016 Jun 16;7:11863. doi: 10.1038/ncomms11863.
6
A network approach to analyzing highly recombinant malaria parasite genes.一种分析高度重组疟原虫基因的网络方法。
PLoS Comput Biol. 2013;9(10):e1003268. doi: 10.1371/journal.pcbi.1003268. Epub 2013 Oct 10.
7
Community detection by signaling on complex networks.通过复杂网络上的信号进行社区检测。
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Jul;78(1 Pt 2):016115. doi: 10.1103/PhysRevE.78.016115. Epub 2008 Jul 30.
8
K-means clustering: a half-century synthesis.K均值聚类:半个世纪的综述
Br J Math Stat Psychol. 2006 May;59(Pt 1):1-34. doi: 10.1348/000711005X48266.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验