多组学数据的整合聚类方法。

Integrative clustering methods for multi-omics data.

作者信息

Zhang Xiaoyu, Zhou Zhenwei, Xu Hanfei, Liu Ching-Ti

机构信息

Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA.

出版信息

Wiley Interdiscip Rev Comput Stat. 2022 May-Jun;14(3). doi: 10.1002/wics.1553. Epub 2021 Feb 7.

DOI:10.1002/wics.1553

PMID:35573155

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9097984/

Abstract

Integrative analysis of multi-omics data has drawn much attention from the scientific community due to the technological advancements which have generated various omics data. Leveraging these multi-omics data potentially provides a more comprehensive view of the disease mechanism or biological processes. Integrative multi-omics clustering is an unsupervised integrative method specifically used to find coherent groups of samples or features by utilizing information across multi-omics data. It aims to better stratify diseases and to suggest biological mechanisms and potential targeted therapies for the diseases. However, applying integrative multi-omics clustering is both statistically and computationally challenging due to various reasons such as high dimensionality and heterogeneity. In this review, we summarized integrative multi-omics clustering methods into three general categories: , , and based on when and how the multi-omics data are processed for clustering. We further classified the methods into different approaches under each category based on the main statistical strategy used during clustering. In addition, we have provided recommended practices tailored to four real-life scenarios to help researchers to strategize their selection in integrative multi-omics clustering methods for their future studies.

摘要

由于技术进步产生了各种组学数据，多组学数据的综合分析已引起科学界的广泛关注。利用这些多组学数据有可能提供对疾病机制或生物过程更全面的看法。综合多组学聚类是一种无监督的综合方法，专门用于通过利用多组学数据中的信息来找到样本或特征的连贯组。其目的是更好地对疾病进行分层，并为疾病提出生物学机制和潜在的靶向治疗方法。然而，由于高维度和异质性等各种原因，应用综合多组学聚类在统计和计算上都具有挑战性。在本综述中，我们根据多组学数据在何时以及如何进行聚类处理，将综合多组学聚类方法总结为三大类：、和。我们根据聚类过程中使用的主要统计策略，将每一类方法进一步细分为不同的方法。此外，我们针对四种实际场景提供了推荐做法，以帮助研究人员在未来研究中为综合多组学聚类方法的选择制定策略。

相似文献

Integrative clustering methods for multi-omics data.多组学数据的整合聚类方法。

Wiley Interdiscip Rev Comput Stat. 2022 May-Jun;14(3). doi: 10.1002/wics.1553. Epub 2021 Feb 7.

Survey and comparative assessments of computational multi-omics integrative methods with multiple regulatory networks identifying distinct tumor compositions across pan-cancer data sets.对具有多个调控网络的计算多组学综合方法进行调查和比较评估，以识别泛癌数据集之间不同的肿瘤组成。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa102.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data.MCluster-VAEs：一种基于变分深度学习的端到端聚类方法，用于利用多组学数据进行亚型发现。

Comput Biol Med. 2022 Nov;150:106085. doi: 10.1016/j.compbiomed.2022.106085. Epub 2022 Sep 6.

Multi-omics data fusion using adaptive GTO guided Non-negative matrix factorization for cancer subtype discovery.使用自适应广义张量正交分解引导的非负矩阵分解进行癌症亚型发现的多组学数据融合

Comput Methods Programs Biomed. 2023 Jan;228:107246. doi: 10.1016/j.cmpb.2022.107246. Epub 2022 Nov 16.

HCNM: Heterogeneous Correlation Network Model for Multi-level Integrative Study of Multi-omics Data for Cancer Subtype Prediction.HCNM：用于癌症亚型预测的多组学数据多层次综合研究的异质相关网络模型。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1880-1886. doi: 10.1109/EMBC46164.2021.9630781.

Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization.使用序贯双重正则化对多组学数据进行整合聚类以发现疾病亚型

Biostatistics. 2017 Jan;18(1):165-179. doi: 10.1093/biostatistics/kxw039. Epub 2016 Aug 22.

Evaluation of integrative clustering methods for the analysis of multi-omics data.评估整合聚类方法在多组学数据分析中的应用。

Brief Bioinform. 2020 Mar 23;21(2):541-552. doi: 10.1093/bib/bbz015.

Randomized singular value decomposition for integrative subtype analysis of 'omics data' using non-negative matrix factorization.随机奇异值分解在非负矩阵分解中用于 'omics 数据' 的综合亚型分析。

Stat Appl Genet Mol Biol. 2023 Nov 9;22(1). doi: 10.1515/sagmb-2022-0047. eCollection 2023 Jan 1.

A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.一种用于多类型组学数据综合聚类分析的全贝叶斯潜在变量模型。

Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.

引用本文的文献

GAIN-BRCA: a graph-based AI-net framework for breast cancer subtype classification using multiomics data.GAIN-BRCA：一种基于图的人工智能网络框架，用于利用多组学数据进行乳腺癌亚型分类。

Bioinform Adv. 2025 May 14;5(1):vbaf116. doi: 10.1093/bioadv/vbaf116. eCollection 2025.

Reposition: Focalizing β-Alanine Metabolism and the Anti-Inflammatory Effects of Its Metabolite Based on Multi-Omics Datasets.重新定位：基于多组学数据集聚焦 β-丙氨酸代谢及其代谢物的抗炎作用。

Int J Mol Sci. 2024 Sep 24;25(19):10252. doi: 10.3390/ijms251910252.

Interactive molecular causal networks of hypertension using a fast machine learning algorithm MRdualPC.基于快速机器学习算法 MRdualPC 的高血压分子交互因果网络。

BMC Med Res Methodol. 2024 Aug 2;24(1):168. doi: 10.1186/s12874-024-02229-y.

Integrating Genetic and Transcriptomic Data to Identify Genes Underlying Obesity Risk Loci.整合遗传和转录组数据以鉴定肥胖风险位点背后的基因。

medRxiv. 2024 Jun 12:2024.06.11.24308730. doi: 10.1101/2024.06.11.24308730.

CluF: an unsupervised iterative cluster-fusion method for patient stratification using multiomics data.CluF：一种使用多组学数据进行患者分层的无监督迭代聚类融合方法。

Bioinform Adv. 2024 Jan 30;4(1):vbae015. doi: 10.1093/bioadv/vbae015. eCollection 2024.

Identifying subgroups of childhood obesity by using multiplatform metabotyping.利用多平台代谢分型识别儿童肥胖亚组。

Front Mol Biosci. 2023 Dec 20;10:1301996. doi: 10.3389/fmolb.2023.1301996. eCollection 2023.

Information-incorporated sparse convex clustering for disease subtyping.基于信息融合的稀疏凸聚类疾病亚分类方法。

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad417.

本文引用的文献

Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data.基于临床和多组学数据整合提高结肠癌预后预测性能。

BMC Med Inform Decis Mak. 2020 Feb 7;20(1):22. doi: 10.1186/s12911-020-1043-1.

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration.用于多组学数据整合的13种无监督方法的聚类和变量选择评估

Brief Bioinform. 2020 Dec 1;21(6):2011-2030. doi: 10.1093/bib/bbz138.

Spectrum: fast density-aware spectral clustering for single and multi-omic data.Spectrum：用于单组学和多组学数据的快速密度感知谱聚类。

Bioinformatics. 2020 Feb 15;36(4):1159-1166. doi: 10.1093/bioinformatics/btz704.

RNA sequencing: the teenage years.RNA 测序：青少年时期。

Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.

Evaluation of integrative clustering methods for the analysis of multi-omics data.评估整合聚类方法在多组学数据分析中的应用。

Brief Bioinform. 2020 Mar 23;21(2):541-552. doi: 10.1093/bib/bbz015.

Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array.Illumina EPIC 阵列的 DNA 甲基化研究统计分析指南

BMC Genomics. 2019 May 14;20(1):366. doi: 10.1186/s12864-019-5761-7.

Using association signal annotations to boost similarity network fusion.利用关联信号注释来增强相似网络融合。

Bioinformatics. 2019 Oct 1;35(19):3718-3726. doi: 10.1093/bioinformatics/btz124.

NEMO: cancer subtyping by integration of partial multi-omic data.NEMO：通过整合部分多组学数据进行癌症亚型分类。

Bioinformatics. 2019 Sep 15;35(18):3348-3356. doi: 10.1093/bioinformatics/btz058.

PINSPlus: a tool for tumor subtype discovery in integrated genomic data.PINSPlus：一种整合基因组数据中肿瘤亚型发现的工具。

Bioinformatics. 2019 Aug 15;35(16):2843-2846. doi: 10.1093/bioinformatics/bty1049.

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival.多组学肿瘤数据揭示了与生存相关的分子机制多样性。

Nat Commun. 2018 Oct 26;9(1):4453. doi: 10.1038/s41467-018-06921-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。