用于聚合多个异构组学数据的多视图子空间聚类分析

Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data.

作者信息

Shi Qianqian, Hu Bing, Zeng Tao, Zhang Chuanchao

机构信息

Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.

Department of Applied Mathematics, College of Science, Zhejiang University of Technology, Hangzhou, China.

出版信息

Front Genet. 2019 Aug 20;10:744. doi: 10.3389/fgene.2019.00744. eCollection 2019.

DOI:10.3389/fgene.2019.00744

PMID:31497031

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6712585/

Abstract

Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of "Multi-view Subspace Clustering Analysis (MSCA)," which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.

摘要

整合不同的生物数据类型可以提供生物过程或复杂疾病的全面视图。负责不同表型的分子组合形成多个嵌入（表达）子空间，因此通过常规整合方法识别内在数据结构具有挑战性。在本文中，我们提出了一种新颖的“多视图子空间聚类分析（MSCA）”框架，该框架可以测量同一子空间中样本的局部相似性，并获得多种数据类型的全局共识样本模式（结构），从而全面捕捉样本潜在的异质性。应用于各种合成数据集时，MSCA能够有效地识别预定义的样本模式，并且对数据噪声具有鲁棒性。对于一个真实的生物数据集，即癌细胞系百科全书（CCLE）数据，MSCA成功地识别了跨癌症类型的常见畸变细胞簇。在我们的模拟和案例研究中，也证明了MSCA相对于诸如iClusterPlus、SNF和ANF等现有方法具有显著优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/070b/6712585/2704e3e4e9aa/fgene-10-00744-g001.jpg

相似文献

Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data.

Front Genet. 2019 Aug 20;10:744. doi: 10.3389/fgene.2019.00744. eCollection 2019.

Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data.

Bioinformatics. 2017 Sep 1;33(17):2706-2714. doi: 10.1093/bioinformatics/btx176.

Multi-view manifold regularized compact low-rank representation for cancer samples clustering on multi-omics data.

BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):334. doi: 10.1186/s12859-021-04220-6.

Beyond Low-Rank Representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering.

Neural Netw. 2018 Jul;103:1-8. doi: 10.1016/j.neunet.2018.03.006. Epub 2018 Mar 20.

Integrative subspace clustering by common and specific decomposition for applications on cancer subtype identification.

BMC Med Genomics. 2019 Dec 24;12(Suppl 9):191. doi: 10.1186/s12920-019-0633-1.

Hyper-Laplacian regularized multi-view subspace clustering with low-rank tensor constraint.

Neural Netw. 2020 May;125:214-223. doi: 10.1016/j.neunet.2020.02.014. Epub 2020 Feb 25.

Affine Subspace Robust Low-Rank Self-Representation: From Matrix to Tensor.

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9357-9373. doi: 10.1109/TPAMI.2023.3257407. Epub 2023 Jun 30.

Robust multi-view subspace clustering based on consensus representation and orthogonal diversity.

Neural Netw. 2022 Jun;150:102-111. doi: 10.1016/j.neunet.2022.03.009. Epub 2022 Mar 11.

Generalized Latent Multi-View Subspace Clustering.

IEEE Trans Pattern Anal Mach Intell. 2020 Jan;42(1):86-99. doi: 10.1109/TPAMI.2018.2877660. Epub 2018 Oct 23.

Multi-View Random-Walk Graph Regularization Low-Rank Representation for Cancer Clustering and Differentially Expressed Gene Selection.

IEEE J Biomed Health Inform. 2022 Jul;26(7):3578-3589. doi: 10.1109/JBHI.2022.3151333. Epub 2022 Jul 1.

引用本文的文献

A self-training subspace clustering algorithm based on adaptive confidence for gene expression data.

Front Genet. 2023 Mar 21;14:1132370. doi: 10.3389/fgene.2023.1132370. eCollection 2023.

Multi-modal intermediate integrative methods in neuropsychiatric disorders: A review.

Comput Struct Biotechnol J. 2022 Nov 8;20:6149-6162. doi: 10.1016/j.csbj.2022.11.008. eCollection 2022.

Genetic heterogeneity: Challenges, impacts, and methods through an associative lens.

Genet Epidemiol. 2022 Dec;46(8):555-571. doi: 10.1002/gepi.22497. Epub 2022 Aug 4.

Evaluation and comparison of multi-omics data integration methods for cancer subtyping.

PLoS Comput Biol. 2021 Aug 12;17(8):e1009224. doi: 10.1371/journal.pcbi.1009224. eCollection 2021 Aug.

Cancer Subtype Recognition Based on Laplacian Rank Constrained Multiview Clustering.

Genes (Basel). 2021 Apr 3;12(4):526. doi: 10.3390/genes12040526.

Multi-view clustering for multi-omics data using unified embedding.

Sci Rep. 2020 Aug 12;10(1):13654. doi: 10.1038/s41598-020-70229-1.

An Adaptive Sparse Subspace Clustering for Cell Type Identification.

Front Genet. 2020 Apr 28;11:407. doi: 10.3389/fgene.2020.00407. eCollection 2020.

本文引用的文献

Generalized Latent Multi-View Subspace Clustering.

IEEE Trans Pattern Anal Mach Intell. 2020 Jan;42(1):86-99. doi: 10.1109/TPAMI.2018.2877660. Epub 2018 Oct 23.

Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

Nat Biotechnol. 2018 Jun;36(5):421-427. doi: 10.1038/nbt.4091. Epub 2018 Apr 2.

Local network component analysis for quantifying transcription factor activities.

Methods. 2017 Jul 15;124:25-35. doi: 10.1016/j.ymeth.2017.06.018. Epub 2017 Jul 12.

Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data.

Bioinformatics. 2017 Sep 1;33(17):2706-2714. doi: 10.1093/bioinformatics/btx176.

Multidimensional Integrative Genomics Approaches to Dissecting Cardiovascular Disease.

Front Cardiovasc Med. 2017 Feb 27;4:8. doi: 10.3389/fcvm.2017.00008. eCollection 2017.

Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data.

Bioinformatics. 2016 Jun 1;32(11):1724-32. doi: 10.1093/bioinformatics/btw059. Epub 2016 Feb 1.

moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets.

J Proteome Res. 2016 Mar 4;15(3):755-65. doi: 10.1021/acs.jproteome.5b00824. Epub 2015 Dec 30.

Proteomics. Tissue-based map of the human proteome.

Science. 2015 Jan 23;347(6220):1260419. doi: 10.1126/science.1260419.

Integrated genomic characterization of adrenocortical carcinoma.

Nat Genet. 2014 Jun;46(6):607-12. doi: 10.1038/ng.2953. Epub 2014 Apr 20.

Nat Methods. 2014 Mar;11(3):333-7. doi: 10.1038/nmeth.2810. Epub 2014 Jan 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于聚合多个异构组学数据的多视图子空间聚类分析

Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献