基于多核学习的组学数据集综合共识聚类分析。

Multiple kernel learning for integrative consensus clustering of omic datasets.

机构信息

MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK.

Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge CB2 0AW, UK.

出版信息

Bioinformatics. 2020 Sep 15;36(18):4789-4796. doi: 10.1093/bioinformatics/btaa593.

DOI:10.1093/bioinformatics/btaa593

PMID:32592464

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7750932/

Abstract

MOTIVATION

Diverse applications-particularly in tumour subtyping-have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster Of Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets is unclear.

RESULTS

We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery.

AVAILABILITY AND IMPLEMENTATION

R packages klic and coca are available on the Comprehensive R Archive Network.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

多种应用——特别是在肿瘤分型方面——已经证明了整合聚类技术对于结合来自多个数据源的信息的重要性。聚类簇分析（COCA）就是这样一种方法，它在肿瘤分型方面得到了广泛的应用。然而，COCA 的性质从未被系统地探索过，其对包含噪声数据集的稳健性也不清楚。

结果

我们严格地对 COCA 进行基准测试，并提出了核学习集成聚类（KLIC）作为替代策略。KLIC 将组合聚类结构的挑战表述为一个多核学习问题，其中不同的数据集各自对最终聚类提供加权贡献。这使得噪声数据集的贡献相对于更具信息量的数据集被降低权重。我们通过模拟研究比较了 KLIC 和 COCA 在各种情况下的性能。我们还在癌症分型和转录模块发现的真实数据应用中展示了 KLIC 和 COCA 的输出。

可用性和实现

R 包 klic 和 coca 可在 Comprehensive R Archive Network 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cc9/7750932/24dd8aafd5ab/btaa593f1.jpg

相似文献

Multiple kernel learning for integrative consensus clustering of omic datasets.基于多核学习的组学数据集综合共识聚类分析。

Bioinformatics. 2020 Sep 15;36(18):4789-4796. doi: 10.1093/bioinformatics/btaa593.

intCC: An efficient weighted integrative consensus clustering of multimodal data.intCC：一种高效的多模态数据加权综合共识聚类方法。

Pac Symp Biocomput. 2024;29:627-640.

Consensus clustering applied to multi-omics disease subtyping.共识聚类在多组学疾病分型中的应用。

BMC Bioinformatics. 2021 Jul 6;22(1):361. doi: 10.1186/s12859-021-04279-1.

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.COPS：一种通过稳健的聚类算法多目标评估发现多组学疾病亚型的新平台。

PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.

Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species.用于在两个物种中识别保守基因共聚类的双部分紧谱聚类（BiTSC）算法。

Bioinformatics. 2021 Jun 9;37(9):1225-1233. doi: 10.1093/bioinformatics/btaa741.

Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus.通过具有增强一致性的多视图聚类，从基因组规模的异质TCGA数据集中进行亚型识别。

BMC Med Genomics. 2017 Dec 21;10(Suppl 4):75. doi: 10.1186/s12920-017-0306-x.

Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类，以对患者进行亚型划分。

Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.

Spectrum: fast density-aware spectral clustering for single and multi-omic data.Spectrum：用于单组学和多组学数据的快速密度感知谱聚类。

Bioinformatics. 2020 Feb 15;36(4):1159-1166. doi: 10.1093/bioinformatics/btz704.

Deep structure integrative representation of multi-omics data for cancer subtyping.多组学数据的深度结构综合表示用于癌症亚型分类。

Bioinformatics. 2022 Jun 27;38(13):3337-3342. doi: 10.1093/bioinformatics/btac345.

Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data.亚型生成对抗网络（Subtype-GAN）：一种用于多组学数据综合癌症亚型分析的深度学习方法。

Bioinformatics. 2021 Aug 25;37(16):2231-2237. doi: 10.1093/bioinformatics/btab109.

引用本文的文献

Learning to estimate sample-specific transcriptional networks for 7,000 tumors.学习估计7000个肿瘤样本特异性转录网络。

Proc Natl Acad Sci U S A. 2025 May 27;122(21):e2411930122. doi: 10.1073/pnas.2411930122. Epub 2025 May 23.

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data.VICatMix：用于离散生物医学数据的变分贝叶斯聚类和变量选择

Bioinform Adv. 2025 Mar 17;5(1):vbaf055. doi: 10.1093/bioadv/vbaf055. eCollection 2025.

Identify characteristics of Vietnamese oral squamous cell carcinoma patients by machine learning on transcriptome and clinical-histopathological analysis.通过对转录组以及临床组织病理学分析进行机器学习，识别越南口腔鳞状细胞癌患者的特征。

J Dent Sci. 2024 Dec;19(Suppl 1):S81-S90. doi: 10.1016/j.jds.2024.08.013. Epub 2024 Aug 28.

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.scMNMF：一种基于矩阵分解的单细胞多组学聚类新方法。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae228.

intCC: An efficient weighted integrative consensus clustering of multimodal data.intCC：一种高效的多模态数据加权综合共识聚类方法。

Pac Symp Biocomput. 2024;29:627-640.

A toolbox of machine learning software to support microbiome analysis.一个支持微生物组分析的机器学习软件工具箱。

Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.

COMMO: a web server for the identification and analysis of consensus gene modules across multiple methods.COMMO：一个用于识别和分析多种方法中一致基因模块的网络服务器。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad708.

Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data.通过 CPS-merge 分析的多角度聚类及其在多模态单细胞数据中的应用。

PLoS Comput Biol. 2023 Apr 17;19(4):e1011044. doi: 10.1371/journal.pcbi.1011044. eCollection 2023 Apr.

Genomic Patterns of Malignant Peripheral Nerve Sheath Tumor (MPNST) Evolution Correlate with Clinical Outcome and Are Detectable in Cell-Free DNA.恶性外周神经鞘瘤（MPNST）进化的基因组模式与临床结果相关，并可在无细胞 DNA 中检测到。

Cancer Discov. 2023 Mar 1;13(3):654-671. doi: 10.1158/2159-8290.CD-22-0786.

Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis.整合分子、影像和临床数据分析以预测癌症预后。

Cancers (Basel). 2022 Jun 30;14(13):3215. doi: 10.3390/cancers14133215.

本文引用的文献

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution.GPseudoClust：单细胞分辨率下共享伪轮廓的去卷积。

Bioinformatics. 2020 Mar 1;36(5):1484-1491. doi: 10.1093/bioinformatics/btz778.

Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.聚类组学：针对异构数据集的整合上下文相关聚类

PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.

Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome.综合聚类分析揭示了乳腺癌腔面A型亚型的一种新分类，这对预后有影响。

Breast Cancer Res. 2017 Mar 29;19(1):44. doi: 10.1186/s13058-017-0812-y.

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.基于核函数相似性学习的单细胞 RNA-seq 数据可视化与分析。

Nat Methods. 2017 Apr;14(4):414-416. doi: 10.1038/nmeth.4207. Epub 2017 Mar 6.

Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences.基因组学、转录组学和蛋白质组学：组学数据的兴起及其在生物医学科学中的整合。

Brief Bioinform. 2018 Mar 1;19(2):286-302. doi: 10.1093/bib/bbw114.

MDI-GPU: accelerating integrative modelling for genomic-scale data using GP-GPU computing.MDI-GPU：利用通用并行图形处理单元（GP-GPU）计算加速基因组规模数据的整合建模

Stat Appl Genet Mol Biol. 2016 Mar;15(1):83-6. doi: 10.1515/sagmb-2015-0055.

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.对12种癌症类型的多平台分析揭示了原发组织内部和之间的分子分类。

Cell. 2014 Aug 14;158(4):929-944. doi: 10.1016/j.cell.2014.06.049. Epub 2014 Aug 7.

Principles and methods of integrative genomic analyses in cancer.癌症综合基因组分析的原则和方法。

Nat Rev Cancer. 2014 May;14(5):299-313. doi: 10.1038/nrc3721.

SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS.多组学数据集的稀疏整合聚类

Ann Appl Stat. 2013 Apr 9;7(1):269-294. doi: 10.1214/12-AOAS578.

Bayesian consensus clustering.贝叶斯共识聚类。

Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于多核学习的组学数据集综合共识聚类分析。

Multiple kernel learning for integrative consensus clustering of omic datasets.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献