Suppr超能文献

变分自编码器可学习代谢组学数据的可迁移表示。

Variational autoencoders learn transferrable representations of metabolomics data.

机构信息

Institute of Computational Biology, Helmholtz Center Munich-German Research Center for Environmental Health, 85764, Neuherberg, Germany.

Technical University of Munich-School of Life Sciences, 85354, Freising, Germany.

出版信息

Commun Biol. 2022 Jun 30;5(1):645. doi: 10.1038/s42003-022-03579-3.

Abstract

Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.

摘要

降维方法常用于将高维代谢组学数据集分解为潜在的核心代谢过程。然而,当前最先进的方法通常无法检测代谢组学数据中的非线性。变分自编码器(VAEs)是一种深度学习方法,旨在学习能够推广到未见数据的非线性潜在表示。在这里,我们在一个由超过 4500 个人组成的大规模人类血液样本代谢组学群体上训练了一个 VAE。我们使用全局特征重要性评分分析了潜在空间的途径组成,结果表明潜在维度代表了不同的细胞过程。为了证明模型的泛化能力,我们对 2 型糖尿病、急性髓系白血病和精神分裂症的未见代谢组学数据集生成了潜在表示,并发现与临床患者群体存在显著相关性。值得注意的是,VAEs 表示比线性和非线性主成分分析得出的潜在维度具有更强的效果。总之,我们证明了 VAE 是一种强大的方法,可以学习代谢组学数据中具有生物学意义的、非线性的和可转移的潜在表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9246987/9b0ff80603e8/42003_2022_3579_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验