Suppr
超能文献

使用多视图因子分解自动编码器（MAE）将多组学数据与生物相互作用网络集成。

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).

机构信息

Department of Computer Science and Engineering, University at Buffalo, 338 Davis Hall, Buffalo, 14260, NY, USA.

Department of Computer Science, University of Virginia, 509 Rice Hall, Charlottesville, 22904, VA, USA.

出版信息

BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.

DOI:10.1186/s12864-019-6285-x

PMID:31856727

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6923820/

Abstract

BACKGROUND

Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the "big p, small n" problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging.

RESULTS

We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables.

CONCLUSIONS

To alleviate the overfitting problem in deep learning on multi-omics data with the "big p, small n" problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.

摘要

背景

对各种癌症和其他疾病进行全面的分子谱分析产生了大量的多组学数据。每种类型的组学数据对应于一个特征空间，如基因表达、miRNA 表达、DNA 甲基化等。整合多组学数据可以连接不同层次的分子特征空间，对于阐明各种疾病的分子途径至关重要。挖掘多组学数据的机器学习方法在揭示分子特征之间复杂关系方面具有很大的潜力。然而，由于“大数据、小样本”问题（即小样本量和高维特征），仅使用多组学数据训练大规模可推广的深度学习模型非常具有挑战性。

结果

我们开发了一种名为多视图因子分析自动编码器（MAE）的方法，该方法具有网络约束，可以无缝集成多组学数据和分子相互作用网络等领域知识。我们的方法使用深度表示学习同时学习特征和患者嵌入。特征表示和患者表示都受到训练目标中指定的正则化项的某些约束。通过将领域知识纳入训练目标，我们将良好的归纳偏差隐式引入机器学习模型中，这有助于提高模型的泛化能力。我们在 TCGA 数据集上进行了广泛的实验，证明了使用我们提出的方法整合多组学数据和生物相互作用网络来预测目标临床变量的强大功能。

结论

为了缓解多组学数据中深度学习的过拟合问题，“大数据、小样本”问题，将生物领域知识纳入模型作为归纳偏差是有帮助的。设计能够促进大规模多组学数据和生物医学领域知识无缝集成的机器学习模型，以揭示分子特征和临床特征之间的复杂关系，具有很大的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67da/6923820/946681124b4f/12864_2019_6285_Fig1_HTML.jpg

相似文献

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).

BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.

Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer.

Methods. 2021 May;189:74-85. doi: 10.1016/j.ymeth.2020.07.008. Epub 2020 Aug 5.

AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification.

Comput Biol Med. 2024 Jul;177:108614. doi: 10.1016/j.compbiomed.2024.108614. Epub 2024 May 14.

MOSDNET: A multi-omics classification framework using simplified multi-view deep discriminant representation learning and dynamic edge GCN with multi-task learning.

Comput Biol Med. 2024 Oct;181:109040. doi: 10.1016/j.compbiomed.2024.109040. Epub 2024 Aug 20.

Integrating multi-omics data through deep learning for accurate cancer prognosis prediction.

Comput Biol Med. 2021 Jul;134:104481. doi: 10.1016/j.compbiomed.2021.104481. Epub 2021 May 9.

MODILM: towards better complex diseases classification using a novel multi-omics data integration learning model.

BMC Med Inform Decis Mak. 2023 May 5;23(1):82. doi: 10.1186/s12911-023-02173-9.

A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data.

BMC Bioinformatics. 2019 Oct 28;20(1):527. doi: 10.1186/s12859-019-3116-7.

Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis.

BMC Med Inform Decis Mak. 2020 Sep 15;20(1):225. doi: 10.1186/s12911-020-01225-8.

Multi-omics integration method based on attention deep learning network for biomedical data classification.

Comput Methods Programs Biomed. 2023 Apr;231:107377. doi: 10.1016/j.cmpb.2023.107377. Epub 2023 Jan 27.

MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction.

Int J Mol Sci. 2024 Feb 28;25(5):2788. doi: 10.3390/ijms25052788.

引用本文的文献

MOLUNGN: a multi-omics graph neural network for biomarker discovery and accurate lung cancer classification.

Front Genet. 2025 Jun 4;16:1610284. doi: 10.3389/fgene.2025.1610284. eCollection 2025.

GAIN-BRCA: a graph-based AI-net framework for breast cancer subtype classification using multiomics data.

Bioinform Adv. 2025 May 14;5(1):vbaf116. doi: 10.1093/bioadv/vbaf116. eCollection 2025.

Network-based analyses of multiomics data in biomedicine.

BioData Min. 2025 May 27;18(1):37. doi: 10.1186/s13040-025-00452-x.

Data Interoperability and Harmonization in Cardiovascular Genomic and Precision Medicine.

Circ Genom Precis Med. 2025 Jun;18(3):e004624. doi: 10.1161/CIRCGEN.124.004624. Epub 2025 May 9.

Key genes altered in glioblastoma based on bioinformatics (Review).

Oncol Lett. 2025 Mar 24;29(5):243. doi: 10.3892/ol.2025.14989. eCollection 2025 May.

BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation.

bioRxiv. 2024 Dec 9:2024.12.05.627020. doi: 10.1101/2024.12.05.627020.

Deep learning-based approaches for multi-omics data integration and analysis.

BioData Min. 2024 Oct 2;17(1):38. doi: 10.1186/s13040-024-00391-z.

Unfolding and de-confounding: biologically meaningful causal inference from longitudinal multi-omic networks using METALICA.

mSystems. 2024 Oct 22;9(10):e0130323. doi: 10.1128/msystems.01303-23. Epub 2024 Sep 6.

Medical-informed machine learning: integrating prior knowledge into medical decision systems.

BMC Med Inform Decis Mak. 2024 Jun 28;24(Suppl 4):186. doi: 10.1186/s12911-024-02582-4.

Integrative Multi-Omics Analysis for Etiology Classification and Biomarker Discovery in Stroke: Advancing towards Precision Medicine.

Biology (Basel). 2024 May 13;13(5):338. doi: 10.3390/biology13050338.

本文引用的文献

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark.

IEEE Trans Knowl Data Eng. 2022 Oct;34(10):4854-4873. doi: 10.1109/tkde.2020.3045924. Epub 2020 Dec 21.

Multimodal Machine Learning: A Survey and Taxonomy.

IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.

Integrated Molecular Characterization of Testicular Germ Cell Tumors.

Cell Rep. 2018 Jun 12;23(11):3392-3406. doi: 10.1016/j.celrep.2018.05.039.

An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.

Cell. 2018 Apr 5;173(2):400-416.e11. doi: 10.1016/j.cell.2018.02.052.

Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.

Cell. 2018 Apr 5;173(2):338-354.e15. doi: 10.1016/j.cell.2018.03.034.

The Cancer Genome Atlas: Creating Lasting Value beyond Its Data.

Cell. 2018 Apr 5;173(2):283-285. doi: 10.1016/j.cell.2018.03.042.

Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas.

Cell Rep. 2018 Apr 3;23(1):172-180.e3. doi: 10.1016/j.celrep.2018.03.046.

Using deep learning to model the hierarchical structure and function of a cell.

Nat Methods. 2018 Apr;15(4):290-298. doi: 10.1038/nmeth.4627. Epub 2018 Mar 5.

DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads.

PLoS One. 2017 Jun 5;12(6):e0178751. doi: 10.1371/journal.pone.0178751. eCollection 2017.

Multi-omic data integration enables discovery of hidden biological regularities.

Nat Commun. 2016 Oct 26;7:13091. doi: 10.1038/ncomms13091.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

使用多视图因子分解自动编码器（MAE）将多组学数据与生物相互作用网络集成。

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译