Suppr超能文献

CGC:一种用于集成共正则化多域图进行聚类的灵活且稳健的方法。

CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.

作者信息

Cheng Wei, Guo Zhishan, Zhang Xiang, Wang Wei

机构信息

UNC at Chapel Hill.

Case Western Reserve University.

出版信息

ACM Trans Knowl Discov Data. 2016 Jul;10(4). doi: 10.1145/2903147.

Abstract

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success, existing multi-view graph clustering methods usually assume that different views are available for the set of instances. Thus instances in different domains can be treated as having strict relationship. In many real-life applications, however, data instances in one domain may correspond to multiple instances in another domain. Moreover, relationships between instances in different domains may be associated with weights based on prior (partial) knowledge. In this paper, we propose a flexible and robust framework, CGC (Co-regularized Graph Clustering), based on non-negative matrix factorization (NMF), to tackle these challenges. CGC has several advantages over the existing methods. First, it supports cross-domain instance relationship. Second, it incorporates weight on cross-domain relationship. Third, it allows partial cross-domain mapping so that graphs in different domains may have different sizes. Finally, it provides users with the extent to which the cross-domain instance relationship violates the in-domain clustering structure, and thus enables users to re-evaluate the consistency of the relationship. We develop an efficient optimization method that guarantees to find the global optimal solution with a given confidence requirement. The proposed method can automatically identify noisy domains and assign smaller weights to them. This helps to obtain optimal graph partition for the focused domain. Extensive experimental results on UCI benchmark data sets, newsgroup data sets and biological interaction networks demonstrate the effectiveness of our approach.

摘要

多视图图聚类旨在通过整合在不同领域收集的异构信息来提高聚类性能。每个领域提供了数据实例的不同视图。利用跨域信息已被证明是获得更好聚类结果的有效方法。尽管此前取得了成功,但现有的多视图图聚类方法通常假设不同视图可用于实例集。因此,不同领域中的实例可被视为具有严格的关系。然而,在许多实际应用中,一个领域中的数据实例可能对应于另一个领域中的多个实例。此外,不同领域中实例之间的关系可能基于先验(部分)知识与权重相关联。在本文中,我们提出了一个基于非负矩阵分解(NMF)的灵活且稳健的框架CGC(协同正则化图聚类)来应对这些挑战。与现有方法相比,CGC具有几个优点。首先,它支持跨域实例关系。其次,它在跨域关系中纳入了权重。第三,它允许部分跨域映射,以便不同领域中的图可能具有不同的大小。最后,它为用户提供了跨域实例关系违反域内聚类结构的程度,从而使用户能够重新评估关系的一致性。我们开发了一种高效的优化方法,该方法保证在给定的置信度要求下找到全局最优解。所提出的方法可以自动识别有噪声的领域并为其分配较小的权重。这有助于为重点领域获得最优的图划分。在UCI基准数据集、新闻组数据集和生物相互作用网络上的大量实验结果证明了我们方法的有效性。

相似文献

1
2
Robust Bi-Stochastic Graph Regularized Matrix Factorization for Data Clustering.
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):390-403. doi: 10.1109/TPAMI.2020.3007673. Epub 2021 Dec 7.
3
A Semisupervised Classification Approach for Multidomain Networks With Domain Selection.
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):269-283. doi: 10.1109/TNNLS.2018.2837166. Epub 2018 Jun 14.
4
Multi-view clustering via multi-manifold regularized non-negative matrix factorization.
Neural Netw. 2017 Apr;88:74-89. doi: 10.1016/j.neunet.2017.02.003. Epub 2017 Feb 8.
5
Convex nonnegative matrix factorization with manifold regularization.
Neural Netw. 2015 Mar;63:94-103. doi: 10.1016/j.neunet.2014.11.007. Epub 2014 Dec 4.
7
Multi-view clustering on data with partial instances and clusters.
Neural Netw. 2020 Sep;129:19-30. doi: 10.1016/j.neunet.2020.05.021. Epub 2020 May 22.
9
Towards a unified framework for graph-based multi-view clustering.
Neural Netw. 2024 May;173:106197. doi: 10.1016/j.neunet.2024.106197. Epub 2024 Feb 23.
10
Multi-Domain Networks Association for Biological Data Using Block Signed Graph Clustering.
IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):435-448. doi: 10.1109/TCBB.2018.2848904. Epub 2018 Jun 25.

引用本文的文献

1
Bayesian Multi-View Clustering given complex inter-view structure.
F1000Res. 2024 Feb 29;11:1460. doi: 10.12688/f1000research.126215.2. eCollection 2022.
2
ComClus: A Self-Grouping Framework for Multi-Network Clustering.
IEEE Trans Knowl Data Eng. 2018 Mar 1;30(3):435-448. doi: 10.1109/TKDE.2017.2771762. Epub 2017 Nov 9.
3
Self-Grouping Multi-Network Clustering.
Proc IEEE Int Conf Data Min. 2016 Dec;2016:1119-1124. doi: 10.1109/ICDM.2016.0146. Epub 2017 Feb 2.
4
Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment.
Proc IEEE Int Conf Data Min. 2015 Nov;2015:291-300. doi: 10.1109/ICDM.2015.13.

本文引用的文献

1
Genome-wide searching of rare genetic variants in WTCCC data.
Hum Genet. 2010 Sep;128(3):269-80. doi: 10.1007/s00439-010-0849-9. Epub 2010 Jun 13.
2
TEAM: efficient two-locus epistasis tests in human genome-wide association study.
Bioinformatics. 2010 Jun 15;26(12):i217-27. doi: 10.1093/bioinformatics/btq186.
3
Detection of functional modes in protein dynamics.
PLoS Comput Biol. 2009 Aug;5(8):e1000480. doi: 10.1371/journal.pcbi.1000480. Epub 2009 Aug 28.
4
Detecting gene-gene interactions that underlie human diseases.
Nat Rev Genet. 2009 Jun;10(6):392-404. doi: 10.1038/nrg2579.
5
Geometric interpretation of gene coexpression network analysis.
PLoS Comput Biol. 2008 Aug 15;4(8):e1000117. doi: 10.1371/journal.pcbi.1000117.
6
An ensemble framework for clustering protein-protein interaction networks.
Bioinformatics. 2007 Jul 1;23(13):i29-40. doi: 10.1093/bioinformatics/btm212.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验