SMRT：用于癌症亚型分类和大数据分析的随机数据转换

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.

作者信息

Nguyen Hung, Tran Duc, Tran Bang, Roy Monikrishna, Cassell Adam, Dascalu Sergiu, Draghici Sorin, Nguyen Tin

机构信息

Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States.

Department of Computer Science, Wayne State University, Detroit, MI, United States.

出版信息

Front Oncol. 2021 Oct 20;11:725133. doi: 10.3389/fonc.2021.725133. eCollection 2021.

DOI:10.3389/fonc.2021.725133

PMID:34745946

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8563705/

Abstract

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.

摘要

癌症是一个统称，涵盖了一系列病症，从快速生长且致命的疾病到进展缓慢、致死可能性低或延迟的惰性病变。治疗方案以及治疗成功率高度依赖于对个体患者的正确亚型分类。随着高通量平台的发展，我们有机会从整体角度区分癌症亚型，该角度考虑了不同分子水平（mRNA、甲基化等）的现象。这需要强大的整合方法来利用大型多组学数据集进行更好的亚型分类。在此，我们介绍一种使用随机变换进行多组学亚型分类的方法（SMRT），这是一种用于多组学整合和癌症亚型分类的新方法。与现有方法相比，SMRT具有以下优势：（i）可扩展的分析流程使研究人员能够整合多组学数据，并在数分钟内分析数十万样本；（ii）能够整合不同患者数量的数据类型；（iii）能够分析不同类型的不匹配数据；（iv）能够通过网络应用为用户提供便捷的数据分析流程。我们还提高了基于集成的扰动聚类的效率，以支持在内存受限的机器上进行分析。在广泛的分析中，我们使用37个TCGA数据集和两个METABRIC数据集（共包含来自28种不同癌症的近12,000个患者样本），将SMRT与八种最先进的亚型分类方法进行了比较。我们还进行了一些模拟研究。我们证明，在识别具有显著不同生存特征的亚型方面，SMRT优于其他方法。此外，SMRT速度极快，能够在数分钟内分析数十万样本。网络应用可在http://SMRT.tinnguyen-lab.com获取。R包将作为我们PINSPlus软件套件的一部分存入CRAN。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ce4/8563705/ac41bf4327bd/fonc-11-725133-g001.jpg

相似文献

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.SMRT：用于癌症亚型分类和大数据分析的随机数据转换

Front Oncol. 2021 Oct 20;11:725133. doi: 10.3389/fonc.2021.725133. eCollection 2021.

A Novel Method for Cancer Subtyping and Risk Prediction Using Consensus Factor Analysis.一种使用共识因子分析进行癌症亚型分类和风险预测的新方法。

Front Oncol. 2020 Jun 24;10:1052. doi: 10.3389/fonc.2020.01052. eCollection 2020.

CEPICS: A Comparison and Evaluation Platform for Integration Methods in Cancer Subtyping.CEPICS：癌症亚型分类中整合方法的比较与评估平台

Front Genet. 2019 Oct 8;10:966. doi: 10.3389/fgene.2019.00966. eCollection 2019.

PINSPlus: a tool for tumor subtype discovery in integrated genomic data.PINSPlus：一种整合基因组数据中肿瘤亚型发现的工具。

Bioinformatics. 2019 Aug 15;35(16):2843-2846. doi: 10.1093/bioinformatics/bty1049.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping.捕获自动编码器的潜在空间，用于多组学整合和癌症亚型分类。

Comput Biol Med. 2022 Sep;148:105832. doi: 10.1016/j.compbiomed.2022.105832. Epub 2022 Jul 5.

A network embedding based method for partial multi-omics integration in cancer subtyping.基于网络嵌入的癌症亚型划分中部分多组学整合方法。

Methods. 2021 Aug;192:67-76. doi: 10.1016/j.ymeth.2020.08.001. Epub 2020 Aug 14.

MOVICS: an R package for multi-omics integration and visualization in cancer subtyping.MOVICS：一个用于癌症亚型多组学整合与可视化的R包。

Bioinformatics. 2021 Apr 1;36(22-23):5539-5541. doi: 10.1093/bioinformatics/btaa1018.

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis.基于稀疏典型相关分析的癌症分子亚型多组学数据融合

Front Genet. 2021 Jul 22;12:607817. doi: 10.3389/fgene.2021.607817. eCollection 2021.

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration.用于多组学数据整合的13种无监督方法的聚类和变量选择评估

Brief Bioinform. 2020 Dec 1;21(6):2011-2030. doi: 10.1093/bib/bbz138.

引用本文的文献

The natural symbiotic bacterium Enterococcus faecalis LX10 drives Bombyx mori refractoriness to Nosema bombycis infection via the secretion of enterococcin.天然共生细菌粪肠球菌LX10通过分泌肠球菌素使家蚕对家蚕微孢子虫感染产生抗性。

BMC Microbiol. 2025 May 17;25(1):303. doi: 10.1186/s12866-025-03980-y.

Integrative analysis of m3C associated genes reveals METTL2A as a potential oncogene in breast Cancer.m3C 相关基因的综合分析揭示 METTL2A 是乳腺癌的潜在癌基因。

J Transl Med. 2022 Oct 20;20(1):476. doi: 10.1186/s12967-022-03683-2.

Crosstalk Between Metabolism and Immune Activity Reveals Four Subtypes With Therapeutic Implications in Clear Cell Renal Cell Carcinoma.代谢与免疫活性的相互作用揭示了四种亚型，这对透明细胞肾细胞癌的治疗具有重要意义。

Front Immunol. 2022 Apr 11;13:861328. doi: 10.3389/fimmu.2022.861328. eCollection 2022.

本文引用的文献

CPA: a web-based platform for consensus pathway analysis and interactive visualization.CPA：一个基于网络的共识途径分析和交互式可视化平台。

Nucleic Acids Res. 2021 Jul 2;49(W1):W114-W124. doi: 10.1093/nar/gkab421.

A Novel Method for Cancer Subtyping and Risk Prediction Using Consensus Factor Analysis.一种使用共识因子分析进行癌症亚型分类和风险预测的新方法。

Front Oncol. 2020 Jun 24;10:1052. doi: 10.3389/fonc.2020.01052. eCollection 2020.

EGFR mutation: novel prognostic factor associated with immune infiltration in lower-grade glioma; an exploratory study.EGFR 突变：与低级别胶质瘤免疫浸润相关的新型预后因素；一项探索性研究。

BMC Cancer. 2019 Dec 4;19(1):1184. doi: 10.1186/s12885-019-6384-8.

NEMO: cancer subtyping by integration of partial multi-omic data.NEMO：通过整合部分多组学数据进行癌症亚型分类。

Bioinformatics. 2019 Sep 15;35(18):3348-3356. doi: 10.1093/bioinformatics/btz058.

PINSPlus: a tool for tumor subtype discovery in integrated genomic data.PINSPlus：一种整合基因组数据中肿瘤亚型发现的工具。

Bioinformatics. 2019 Aug 15;35(16):2843-2846. doi: 10.1093/bioinformatics/bty1049.

Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data.基于图正则化多视图非负矩阵分解的癌症基因组数据共差异基因选择与聚类

Genes (Basel). 2018 Nov 28;9(12):586. doi: 10.3390/genes9120586.

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival.多组学肿瘤数据揭示了与生存相关的分子机制多样性。

Nat Commun. 2018 Oct 26;9(1):4453. doi: 10.1038/s41467-018-06921-8.

Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类，以对患者进行亚型划分。

Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.

A novel approach for data integration and disease subtyping.一种用于数据集成和疾病分型的新方法。

Genome Res. 2017 Dec;27(12):2025-2039. doi: 10.1101/gr.215129.116. Epub 2017 Oct 24.

Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering.基于生存的贝叶斯聚类，探索更具临床相关性的患者异质性解剖。

Bioinformatics. 2017 Nov 15;33(22):3558-3566. doi: 10.1093/bioinformatics/btx464.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SMRT：用于癌症亚型分类和大数据分析的随机数据转换

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献