基于 Stiefel 流形的多视图癌症数据聚类。

Clustering of cancer data based on Stiefel manifold for multiple views.

机构信息

College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.

School of Computer Science and Technology, Anhui University, Hefei, China.

出版信息

BMC Bioinformatics. 2021 May 25;22(1):268. doi: 10.1186/s12859-021-04195-4.

DOI:10.1186/s12859-021-04195-4

PMID:34034643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8152349/

Abstract

BACKGROUND

In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets.

RESULTS

In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same.

CONCLUSION

Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.

摘要

背景

近年来，各种测序技术已被用于收集生物医学组学数据集。通常可以从单个患者样本中获得多种类型的组学数据。组学数据聚类在生物和医学研究中起着不可或缺的作用，有助于揭示来自多个集合的数据结构。然而，组学数据聚类包含许多挑战。组学数据分析中的主要挑战来自于数据的高维性和样本的小尺寸。因此，很难找到一种合适的方法来对多个数据集进行结构分析。

结果

本文提出了一种基于 Stiefel 流形的多视图聚类方法（MCSM）。MCSM 方法包括三个核心步骤。首先，我们建立了一个同时聚类问题的二进制优化模型。其次，我们基于 Stiefel 流形通过线性搜索算法求解了优化问题。最后，我们使用 K-最近邻方法整合了从三种组学中获得的聚类结果。我们在 TCGA 上的四个癌症数据集上应用了这种方法。结果表明，我们的方法优于几种最先进的方法，这取决于假设潜在的组学聚类类是相同的。

结论

特别是当潜在的聚类不一致时，我们的方法比比较方法具有更好的性能。对于具有不同亚型的患者，可以同时识别一致和差异聚类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b81/8152349/317b84b0f715/12859_2021_4195_Fig1_HTML.jpg

相似文献

Clustering of cancer data based on Stiefel manifold for multiple views.基于 Stiefel 流形的多视图癌症数据聚类。

BMC Bioinformatics. 2021 May 25;22(1):268. doi: 10.1186/s12859-021-04195-4.

Simultaneous clustering of multiview biomedical data using manifold optimization.基于流形优化的多视图生物医学数据的同步聚类。

Bioinformatics. 2019 Oct 15;35(20):4029-4037. doi: 10.1093/bioinformatics/btz217.

Multi-Manifold Optimization for Multi-View Subspace Clustering.用于多视图子空间聚类的多流形优化

IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3895-3907. doi: 10.1109/TNNLS.2021.3054789. Epub 2022 Aug 3.

Multi-view manifold regularized compact low-rank representation for cancer samples clustering on multi-omics data.基于多组学数据的癌症样本聚类的多视图流形正则化紧致低秩表示

BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):334. doi: 10.1186/s12859-021-04220-6.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data.通过稳健低秩逼近的凸多视图聚类及其在多组学数据中的应用

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3340-3352. doi: 10.1109/TCBB.2021.3122961. Epub 2022 Dec 8.

Consensus guided incomplete multi-view spectral clustering.共识指导的不完全多视图谱聚类。

Neural Netw. 2021 Jan;133:207-219. doi: 10.1016/j.neunet.2020.10.014. Epub 2020 Nov 11.

A multiobjective multi-view cluster ensemble technique: Application in patient subclassification.一种多目标多视图聚类集成技术：在患者分类中的应用。

PLoS One. 2019 May 23;14(5):e0216904. doi: 10.1371/journal.pone.0216904. eCollection 2019.

Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus.通过具有增强一致性的多视图聚类，从基因组规模的异质TCGA数据集中进行亚型识别。

BMC Med Genomics. 2017 Dec 21;10(Suppl 4):75. doi: 10.1186/s12920-017-0306-x.

PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME：基于通路的多模态稀疏自动编码器，用于对患者层面多组学数据进行聚类。

BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.

引用本文的文献

MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning.MOCSS：通过共享和特定表示学习进行多组学数据聚类与癌症亚型分析

iScience. 2023 Jul 13;26(8):107378. doi: 10.1016/j.isci.2023.107378. eCollection 2023 Aug 18.

本文引用的文献

Multi-view clustering for multi-omics data using unified embedding.使用统一嵌入的多组学数据多视图聚类

Sci Rep. 2020 Aug 12;10(1):13654. doi: 10.1038/s41598-020-70229-1.

Simultaneous clustering of multiview biomedical data using manifold optimization.基于流形优化的多视图生物医学数据的同步聚类。

Bioinformatics. 2019 Oct 15;35(20):4029-4037. doi: 10.1093/bioinformatics/btz217.

Integrative cancer patient stratification via subspace merging.基于子空间合并的癌症患者综合分层。

Bioinformatics. 2019 May 15;35(10):1653-1659. doi: 10.1093/bioinformatics/bty866.

Multi-omic and multi-view clustering algorithms: review and cancer benchmark.多组学和多视角聚类算法：综述和癌症基准测试。

Nucleic Acids Res. 2018 Nov 16;46(20):10546-10562. doi: 10.1093/nar/gky889.

Identification of mutated driver pathways in cancer using a multi-objective optimization model.使用多目标优化模型识别癌症中的突变驱动通路。

Comput Biol Med. 2016 May 1;72:22-9. doi: 10.1016/j.compbiomed.2016.03.002. Epub 2016 Mar 10.

Identification of ovarian cancer subtype-specific network modules and candidate drivers through an integrative genomics approach.通过整合基因组学方法鉴定卵巢癌亚型特异性网络模块和候选驱动因子。

Oncotarget. 2016 Jan 26;7(4):4298-309. doi: 10.18632/oncotarget.6774.

Functional Module Analysis for Gene Coexpression Networks with Network Integration.基于网络整合的基因共表达网络功能模块分析

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1146-60. doi: 10.1109/TCBB.2015.2396073.

Nat Methods. 2014 Mar;11(3):333-7. doi: 10.1038/nmeth.2810. Epub 2014 Jan 26.

Patient-specific data fusion defines prognostic cancer subtypes.个体化患者数据融合定义了预后癌症亚型。

PLoS Comput Biol. 2011 Oct;7(10):e1002227. doi: 10.1371/journal.pcbi.1002227. Epub 2011 Oct 20.

Tumor classification based on non-negative matrix factorization using gene expression data.基于基因表达数据的非负矩阵分解的肿瘤分类。

IEEE Trans Nanobioscience. 2011 Jun;10(2):86-93. doi: 10.1109/TNB.2011.2144998. Epub 2011 Jul 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 Stiefel 流形的多视图癌症数据聚类。

Clustering of cancer data based on Stiefel manifold for multiple views.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献