基于变分自动编码器的卵巢癌多组学综合分析。

Integrated multi-omics analysis of ovarian cancer using variational autoencoders.

机构信息

School of Health and Life Sciences, Teesside University, Middlesbrough, TS4 3BX, UK.

School of Computing, Eng. & Digital Tech., Teesside University, Middlesbrough, TS4 3BX, UK.

出版信息

Sci Rep. 2021 Mar 18;11(1):6265. doi: 10.1038/s41598-021-85285-4.

DOI:10.1038/s41598-021-85285-4

PMID:33737557

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7973750/

Abstract

Cancer is a complex disease that deregulates cellular functions at various molecular levels (e.g., DNA, RNA, and proteins). Integrated multi-omics analysis of data from these levels is necessary to understand the aberrant cellular functions accountable for cancer and its development. In recent years, Deep Learning (DL) approaches have become a useful tool in integrated multi-omics analysis of cancer data. However, high dimensional multi-omics data are generally imbalanced with too many molecular features and relatively few patient samples. This imbalance makes a DL based integrated multi-omics analysis difficult. DL-based dimensionality reduction technique, including variational autoencoder (VAE), is a potential solution to balance high dimensional multi-omics data. However, there are few VAE-based integrated multi-omics analyses, and they are limited to pancancer. In this work, we did an integrated multi-omics analysis of ovarian cancer using the compressed features learned through VAE and an improved version of VAE, namely Maximum Mean Discrepancy VAE (MMD-VAE). First, we designed and developed a DL architecture for VAE and MMD-VAE. Then we used the architecture for mono-omics, integrated di-omics and tri-omics data analysis of ovarian cancer through cancer samples identification, molecular subtypes clustering and classification, and survival analysis. The results show that MMD-VAE and VAE-based compressed features can respectively classify the transcriptional subtypes of the TCGA datasets with an accuracy in the range of 93.2-95.5% and 87.1-95.7%. Also, survival analysis results show that VAE and MMD-VAE based compressed representation of omics data can be used in cancer prognosis. Based on the results, we can conclude that (i) VAE and MMD-VAE outperform existing dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better than VAE in most omics dataset.

摘要

癌症是一种复杂的疾病，会在各种分子水平上（例如 DNA、RNA 和蛋白质）扰乱细胞功能。为了了解导致癌症及其发展的异常细胞功能，有必要对这些水平的数据进行整合的多组学分析。近年来，深度学习（DL）方法已成为癌症数据整合多组学分析的有用工具。然而，高维多组学数据通常存在不平衡问题，即有太多的分子特征和相对较少的患者样本。这种不平衡使得基于 DL 的整合多组学分析变得困难。基于 DL 的降维技术，包括变分自编码器（VAE），是平衡高维多组学数据的潜在解决方案。然而，基于 VAE 的整合多组学分析很少，并且仅限于泛癌。在这项工作中，我们使用通过 VAE 和 VAE 的改进版本最大均值差异变分自编码器（MMD-VAE）学习的压缩特征对卵巢癌进行了整合的多组学分析。首先，我们设计并开发了用于 VAE 和 MMD-VAE 的 DL 架构。然后，我们通过癌症样本识别、分子亚型聚类和分类以及生存分析，使用该架构对卵巢癌的单组学、整合的二组学和三组学数据进行分析。结果表明，MMD-VAE 和基于 VAE 的压缩特征可分别以 93.2-95.5%和 87.1-95.7%的范围内对 TCGA 数据集的转录亚型进行分类。此外，生存分析结果表明，基于 VAE 和 MMD-VAE 的组学数据的压缩表示可用于癌症预后。基于这些结果，我们可以得出结论：（i）VAE 和 MMD-VAE 优于现有的降维技术；（ii）整合的多组学分析比单组学分析表现更好或相似；（iii）在大多数组学数据集中，MMD-VAE 比 VAE 表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cde/7973750/047d4ed6a741/41598_2021_85285_Fig1_HTML.jpg

相似文献

Integrated multi-omics analysis of ovarian cancer using variational autoencoders.基于变分自动编码器的卵巢癌多组学综合分析。

Sci Rep. 2021 Mar 18;11(1):6265. doi: 10.1038/s41598-021-85285-4.

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.新型多组学去混淆变分自动编码器可获得有意义的疾病亚型。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae512.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.基于多组学数据预测卵巢癌生存的最小冗余最大相关性多视图特征选择。

BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.

XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data.XOmiVAE：一种使用高维组学数据进行癌症分类的可解释深度学习模型。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab315.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Unsupervised classification of multi-omics data during cardiac remodeling using deep learning.使用深度学习对心脏重构过程中的多组学数据进行无监督分类。

Methods. 2019 Aug 15;166:66-73. doi: 10.1016/j.ymeth.2019.03.004. Epub 2019 Mar 7.

Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer.通过学习模态不变表示来整合多组学数据，以提高癌症总体生存预测的准确性。

Methods. 2021 May;189:74-85. doi: 10.1016/j.ymeth.2020.07.008. Epub 2020 Aug 5.

PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME：基于通路的多模态稀疏自动编码器，用于对患者层面多组学数据进行聚类。

BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.

Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches.迈向肿瘤异质性的多组学特征分析：统计和机器学习方法的综合综述。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa188.

引用本文的文献

Integrating Imaging-Derived Clinical Endotypes with Plasma Proteomics and External Polygenic Risk Scores Enhances Coronary Microvascular Disease Risk Prediction.将影像学衍生的临床内型与血浆蛋白质组学和外部多基因风险评分相结合可增强冠状动脉微血管疾病风险预测。

medRxiv. 2025 Aug 21:2025.08.18.25333844. doi: 10.1101/2025.08.18.25333844.

Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.利用生成式深度学习构建转录肿瘤程序的数字组织学模型，用于基于病理学的精准医学。

Genome Med. 2025 Aug 7;17(1):87. doi: 10.1186/s13073-025-01502-z.

Interpretable and integrative analysis of single-cell multiomics with scMKL.使用scMKL对单细胞多组学进行可解释的综合分析。

Commun Biol. 2025 Aug 6;8(1):1160. doi: 10.1038/s42003-025-08533-7.

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.多组学数据整合方法的技术综述：从经典统计方法到深度生成方法

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.

Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.用于传染病监测、诊断和预后的机器学习与人工智能

Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.

JASMINE: A powerful representation learning method for enhanced analysis of incomplete multi-omics data.JASMINE：一种用于增强对不完整多组学数据进行分析的强大表示学习方法。

bioRxiv. 2025 Jun 22:2025.06.16.659949. doi: 10.1101/2025.06.16.659949.

Integrating NMR and MS for Improved Metabolomic Analysis: From Methodologies to Applications.整合核磁共振与质谱技术以改进代谢组学分析：从方法到应用

Molecules. 2025 Jun 17;30(12):2624. doi: 10.3390/molecules30122624.

Multimodal CustOmics: A unified and interpretable multi-task deep learning framework for multimodal integrative data analysis in oncology.多模态定制组学：一种用于肿瘤学多模态整合数据分析的统一且可解释的多任务深度学习框架。

PLoS Comput Biol. 2025 Jun 17;21(6):e1013012. doi: 10.1371/journal.pcbi.1013012. eCollection 2025 Jun.

Early Diagnosis of Ovarian Cancer: A Comprehensive Review of the Advances, Challenges, and Future Directions.卵巢癌的早期诊断：进展、挑战及未来方向综述

Diagnostics (Basel). 2025 Feb 7;15(4):406. doi: 10.3390/diagnostics15040406.

Multiomics Research: Principles and Challenges in Integrated Analysis.多组学研究：综合分析中的原理与挑战

Biodes Res. 2024 Dec 5;6:0059. doi: 10.34133/bdr.0059. eCollection 2024.

本文引用的文献

Multi-omic signatures identify pan-cancer classes of tumors beyond tissue of origin.多组学特征可识别起源组织之外的泛癌肿瘤类别。

Sci Rep. 2020 May 20;10(1):8341. doi: 10.1038/s41598-020-65119-5.

NDRG2 gene expression pattern in ovarian cancer and its specific roles in inhibiting cancer cell proliferation and suppressing cancer cell apoptosis.NDRG2 基因在卵巢癌中的表达模式及其在抑制癌细胞增殖和抑制癌细胞凋亡中的特异性作用。

J Ovarian Res. 2020 Apr 28;13(1):48. doi: 10.1186/s13048-020-00649-0.

Estimating gene expression from DNA methylation and copy number variation: A deep learning regression model for multi-omics integration.从 DNA 甲基化和拷贝数变异估计基因表达：一种用于多组学整合的深度学习回归模型。

Genomics. 2020 Jul;112(4):2833-2841. doi: 10.1016/j.ygeno.2020.03.021. Epub 2020 Mar 29.

A pan-cancer somatic mutation embedding using autoencoders.基于自动编码器的泛癌种体细胞突变嵌入方法。

BMC Bioinformatics. 2019 Dec 11;20(1):655. doi: 10.1186/s12859-019-3298-z.

The Need for Multi-Omics Biomarker Signatures in Precision Medicine.精准医学中多组学生物标志物特征的必要性。

Int J Mol Sci. 2019 Sep 26;20(19):4781. doi: 10.3390/ijms20194781.

Integrated multiomic analysis reveals comprehensive tumour heterogeneity and novel immunophenotypic classification in hepatocellular carcinomas.整合多组学分析揭示肝癌的全面肿瘤异质性和新型免疫表型分类。

Gut. 2019 Nov;68(11):2019-2031. doi: 10.1136/gutjnl-2019-318912. Epub 2019 Jun 21.

Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer.去噪自编码器提取的深度基因组特征在乳腺癌中的关联分析

Cancers (Basel). 2019 Apr 7;11(4):494. doi: 10.3390/cancers11040494.

A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.使用变量选择对多组学数据整合进行的选择性综述

High Throughput. 2019 Jan 18;8(1):4. doi: 10.3390/ht8010004.

DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays.DIABLO：一种从多组学分析中识别关键分子驱动因素的综合方法。

Bioinformatics. 2019 Sep 1;35(17):3055-3062. doi: 10.1093/bioinformatics/bty1054.

Integrated Genomic, Epigenomic, and Expression Analyses of Ovarian Cancer Cell Lines.卵巢癌细胞系的综合基因组、表观基因组和表达分析。

Cell Rep. 2018 Nov 27;25(9):2617-2633. doi: 10.1016/j.celrep.2018.10.096.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于变分自动编码器的卵巢癌多组学综合分析。

Integrated multi-omics analysis of ovarian cancer using variational autoencoders.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献