使用扩展K均值方法在特征丰富网络上进行社区划分

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method.

作者信息

Shalileh Soroosh, Mirkin Boris

机构信息

Center for Language and Brain, HSE University, Myasnitskaya Ulitsa 20, 101000 Moscow, Russia.

Department of Data Analysis and Artificial Intelligence, HSE University, Pokrovsky Boulevard, 11, 101000 Moscow, Russia.

出版信息

Entropy (Basel). 2022 Apr 29;24(5):626. doi: 10.3390/e24050626.

DOI:10.3390/e24050626

PMID:35626512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9142054/

Abstract

This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets.

摘要

由于我们对非可加性模式的假设，本文提出了一种对著名的K均值算法有意义且有效的扩展，用于在特征丰富的网络中检测社区。我们通过最小二乘法逼近给定的节点间链接矩阵和特征值矩阵，从而将传统的K均值聚类方法直接扩展为一种针对该准则的交替最小化策略。这在一个双重空间中起作用，同时包含网络节点和特征。所使用的度量是特征空间和网络空间中平方欧几里得距离的加权和。为了解决所谓的维度诅咒问题，我们将其扩展为一个使用实体与中心之间余弦距离的版本。我们方法的另一个版本基于曼哈顿距离度量。我们进行计算实验来测试我们的方法，并将其性能与在合成数据集和真实世界数据集上竞争的流行算法的性能进行比较。扩展K均值的基于余弦的版本通常在高维真实世界数据集上获胜。相比之下，基于曼哈顿的版本在大多数合成数据集上获胜。

相似文献

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method.使用扩展K均值方法在特征丰富网络上进行社区划分

Entropy (Basel). 2022 Apr 29;24(5):626. doi: 10.3390/e24050626.

Least-squares community extraction in feature-rich networks using similarity data.基于相似性数据的特征丰富网络中的最小二乘社区提取。

PLoS One. 2021 Jul 15;16(7):e0254377. doi: 10.1371/journal.pone.0254377. eCollection 2021.

Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。

BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.

A robust alternating least squares K-means clustering approach for times series using dynamic time warping dissimilarities.一种使用动态时间规整差异的用于时间序列的稳健交替最小二乘K均值聚类方法。

Math Biosci Eng. 2024 Feb 6;21(3):3631-3651. doi: 10.3934/mbe.2024160.

Distance Metric Based Oversampling Method for Bioinformatics and Performance Evaluation.基于距离度量的生物信息学过采样方法及性能评估

J Med Syst. 2016 Jul;40(7):159. doi: 10.1007/s10916-016-0516-3. Epub 2016 May 16.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。

Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.

Improving the Walktrap Algorithm Using -Means Clustering.使用 -Means 聚类改进 Walktrap 算法。

Multivariate Behav Res. 2024 Mar-Apr;59(2):266-288. doi: 10.1080/00273171.2023.2254767. Epub 2024 Feb 15.

Design of double fuzzy clustering-driven context neural networks.双模糊聚类驱动的上下文神经网络设计。

Neural Netw. 2018 Aug;104:1-14. doi: 10.1016/j.neunet.2018.03.018. Epub 2018 Apr 9.

Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology.目标加权：一种新的聚类方法，用于处理计算生物学中的异常值和聚类重叠问题。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):633-643. doi: 10.1109/TCBB.2019.2921577. Epub 2021 Apr 8.

Robust dimensionality reduction via feature space to feature space distance metric learning.通过特征空间到特征空间距离度量学习实现鲁棒降维。

Neural Netw. 2019 Apr;112:1-14. doi: 10.1016/j.neunet.2019.01.001. Epub 2019 Jan 21.

引用本文的文献

Gate-Level Circuit Partitioning Algorithm Based on Clustering and an Improved Genetic Algorithm.基于聚类和改进遗传算法的门级电路划分算法

Entropy (Basel). 2023 Mar 31;25(4):597. doi: 10.3390/e25040597.

Grid-Based Clustering Using Boundary Detection.基于网格的边界检测聚类

Entropy (Basel). 2022 Nov 4;24(11):1606. doi: 10.3390/e24111606.

本文引用的文献

Least-squares community extraction in feature-rich networks using similarity data.基于相似性数据的特征丰富网络中的最小二乘社区提取。

PLoS One. 2021 Jul 15;16(7):e0254377. doi: 10.1371/journal.pone.0254377. eCollection 2021.

Node Attribute-enhanced Community Detection in Complex Networks.复杂网络中的节点属性增强社区发现。

Sci Rep. 2017 May 25;7(1):2626. doi: 10.1038/s41598-017-02751-8.

The ground truth about metadata and community detection in networks.网络中关于元数据和社区检测的真相。

Sci Adv. 2017 May 3;3(5):e1602548. doi: 10.1126/sciadv.1602548. eCollection 2017 May.

SNAP: A General Purpose Network Analysis and Graph Mining Library.SNAP：一个通用的网络分析和图挖掘库。

ACM Trans Intell Syst Technol. 2016 Oct;8(1). doi: 10.1145/2898361. Epub 2016 Oct 3.

Structure and inference in annotated networks.带注释网络中的结构和推理。

Nat Commun. 2016 Jun 16;7:11863. doi: 10.1038/ncomms11863.

A network approach to analyzing highly recombinant malaria parasite genes.一种分析高度重组疟原虫基因的网络方法。

PLoS Comput Biol. 2013;9(10):e1003268. doi: 10.1371/journal.pcbi.1003268. Epub 2013 Oct 10.

Community detection by signaling on complex networks.通过复杂网络上的信号进行社区检测。

Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Jul;78(1 Pt 2):016115. doi: 10.1103/PhysRevE.78.016115. Epub 2008 Jul 30.

K-means clustering: a half-century synthesis.K均值聚类：半个世纪的综述

Br J Math Stat Psychol. 2006 May;59(Pt 1):1-34. doi: 10.1348/000711005X48266.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验