基于非负矩阵分解算法的多组学数据的整合聚类

Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm.

作者信息

Chalise Prabhakar, Fridley Brooke L

机构信息

Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, United States of America.

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America.

出版信息

PLoS One. 2017 May 1;12(5):e0176278. doi: 10.1371/journal.pone.0176278. eCollection 2017.

DOI:10.1371/journal.pone.0176278

PMID:28459819

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5411077/

Abstract

Integrative analyses of high-throughput 'omic data, such as DNA methylation, DNA copy number alteration, mRNA and protein expression levels, have created unprecedented opportunities to understand the molecular basis of human disease. In particular, integrative analyses have been the cornerstone in the study of cancer to determine molecular subtypes within a given cancer. As malignant tumors with similar morphological characteristics have been shown to exhibit entirely different molecular profiles, there has been significant interest in using multiple 'omic data for the identification of novel molecular subtypes of disease, which could impact treatment decisions. Therefore, we have developed intNMF, an integrative approach for disease subtype classification based on non-negative matrix factorization. The proposed approach carries out integrative clustering of multiple high dimensional molecular data in a single comprehensive analysis utilizing the information across multiple biological levels assessed on the same individual. As intNMF does not assume any distributional form for the data, it has obvious advantages over other model based clustering methods which require specific distributional assumptions. Application of intNMF is illustrated using both simulated and real data from The Cancer Genome Atlas (TCGA).

摘要

对高通量“组学”数据（如DNA甲基化、DNA拷贝数改变、mRNA和蛋白质表达水平）进行综合分析，为理解人类疾病的分子基础创造了前所未有的机会。特别是，综合分析一直是癌症研究中确定特定癌症分子亚型的基石。由于具有相似形态特征的恶性肿瘤已被证明表现出完全不同的分子谱，因此人们对使用多种“组学”数据来识别疾病的新型分子亚型产生了浓厚兴趣，这可能会影响治疗决策。因此，我们开发了intNMF，一种基于非负矩阵分解的疾病亚型分类综合方法。该方法在单一综合分析中对多个高维分子数据进行综合聚类，利用在同一个体上评估的多个生物学水平的信息。由于intNMF不假设数据的任何分布形式，与其他需要特定分布假设的基于模型的聚类方法相比，它具有明显优势。使用来自癌症基因组图谱（TCGA）的模拟数据和真实数据说明了intNMF的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/875a/5411077/22a350a8faad/pone.0176278.g001.jpg

相似文献

Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm.

PLoS One. 2017 May 1;12(5):e0176278. doi: 10.1371/journal.pone.0176278. eCollection 2017.

InterSIM: Simulation tool for multiple integrative 'omic datasets'.

Comput Methods Programs Biomed. 2016 May;128:69-74. doi: 10.1016/j.cmpb.2016.02.011. Epub 2016 Feb 27.

Randomized singular value decomposition for integrative subtype analysis of 'omics data' using non-negative matrix factorization.

Stat Appl Genet Mol Biol. 2023 Nov 9;22(1). doi: 10.1515/sagmb-2022-0047. eCollection 2023 Jan 1.

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.

PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.

Integrative clustering methods for high-dimensional molecular data.

Transl Cancer Res. 2014 Jun 1;3(3):202-216. doi: 10.3978/j.issn.2218-676X.2014.06.03.

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis.

Bioinformatics. 2009 Nov 15;25(22):2906-12. doi: 10.1093/bioinformatics/btp543. Epub 2009 Sep 16.

Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.

Network-based integrative clustering of multiple types of genomic data using non-negative matrix factorization.

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer.

J Biomed Inform. 2015 Aug;56:220-8. doi: 10.1016/j.jbi.2015.05.019. Epub 2015 Jun 3.

Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus.

BMC Med Genomics. 2017 Dec 21;10(Suppl 4):75. doi: 10.1186/s12920-017-0306-x.

引用本文的文献

Molecular subtypes of human skeletal muscle in cancer cachexia.

Nature. 2025 Sep 10. doi: 10.1038/s41586-025-09502-0.

stImage: a versatile framework for optimizing spatial transcriptomic analysis through customizable deep histology and location informed integration.

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf429.

A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.

BMC Genomics. 2025 Aug 22;26(1):769. doi: 10.1186/s12864-025-11925-y.

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.

Retrotransposon methylation profiles and survival in Black women with high-grade serous ovarian carcinoma.

Clin Epigenetics. 2025 Jul 30;17(1):134. doi: 10.1186/s13148-025-01942-9.

MOTL: enhancing multi-omics matrix factorization with transfer learning.

Genome Biol. 2025 Jul 25;26(1):224. doi: 10.1186/s13059-025-03675-7.

GAUDI: interpretable multi-omics integration with UMAP embeddings and density-based clustering.

Nat Commun. 2025 Jul 1;16(1):5771. doi: 10.1038/s41467-025-60822-1.

3Mont: A multi-omics integrative tool for breast cancer subtype stratification.

PLoS One. 2025 Jun 27;20(6):e0326154. doi: 10.1371/journal.pone.0326154. eCollection 2025.

EMitool: Explainable Multi-Omics Integration for Disease Subtyping.

Int J Mol Sci. 2025 Apr 30;26(9):4268. doi: 10.3390/ijms26094268.

Do we need a standardized 16S rRNA gene amplicon sequencing analysis protocol for poultry microbiota research?

Poult Sci. 2025 Jul;104(7):105242. doi: 10.1016/j.psj.2025.105242. Epub 2025 May 1.

本文引用的文献

InterSIM: Simulation tool for multiple integrative 'omic datasets'.

Comput Methods Programs Biomed. 2016 May;128:69-74. doi: 10.1016/j.cmpb.2016.02.011. Epub 2016 Feb 27.

Tensor GSVD of patient- and platform-matched tumor and normal DNA copy-number profiles uncovers chromosome arm-wide patterns of tumor-exclusive platform-consistent alterations encoding for cell transformation and predicting ovarian cancer survival.

PLoS One. 2015 Apr 15;10(4):e0121396. doi: 10.1371/journal.pone.0121396. eCollection 2015.

Integrative clustering methods for high-dimensional molecular data.

Transl Cancer Res. 2014 Jun 1;3(3):202-216. doi: 10.3978/j.issn.2218-676X.2014.06.03.

Principles and methods of integrative genomic analyses in cancer.

Nat Rev Cancer. 2014 May;14(5):299-313. doi: 10.1038/nrc3721.

Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules.

PLoS Genet. 2014 Jan;10(1):e1004006. doi: 10.1371/journal.pgen.1004006. Epub 2014 Jan 2.

Metasignatures identify two major subtypes of breast cancer.

CPT Pharmacometrics Syst Pharmacol. 2013 Mar 27;2(3):e35. doi: 10.1038/psp.2013.11.

Pattern discovery and cancer gene identification in integrated cancer genomic data.

Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50. doi: 10.1073/pnas.1208949110. Epub 2013 Feb 21.

Bioinformatics. 2012 Dec 15;28(24):3290-7. doi: 10.1093/bioinformatics/bts595. Epub 2012 Oct 9.

Comprehensive molecular portraits of human breast tumours.

Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

Discovery of multi-dimensional modules by integrative analysis of cancer genomic data.

Nucleic Acids Res. 2012 Oct;40(19):9379-91. doi: 10.1093/nar/gks725. Epub 2012 Aug 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于非负矩阵分解算法的多组学数据的整合聚类

Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献