Suppr超能文献

基于质谱数据分析和图嵌入的代谢变化探索性分析。

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings.

机构信息

Engineering Department, Pontificia Universidad Catolica del Peru, Lima, Peru.

Institute for Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Lima, Peru.

出版信息

Sci Rep. 2024 Nov 28;14(1):29570. doi: 10.1038/s41598-024-80955-5.

Abstract

Mass spectrometry (MS)-based metabolomics analysis is a powerful tool, but it comes with its own set of challenges. The MS workflow involves multiple steps before its interpretation in what is denominate data mining. Data mining consists of a two-step process. First, the MS data is ordered, arranged, and presented for filtering before being analyzed. Second, the filtered and reduced data are analyzed using statistics to remove further variability. This holds true particularly for MS-based untargeted metabolomics studies, which focused on understanding fold changes in metabolic networks. Since the task of filtering and identifying changes from a large dataset is challenging, automated techniques for mining untargeted MS-based metabolomic data are needed. The traditional statistics-based approach tends to overfilter raw data, which may result in the removal of relevant data and lead to the identification of fewer metabolomic changes. This limitation of the traditional approach underscores the need for a new method. In this work, we present a novel deep learning approach using node embeddings (powered by GNNs), edge embeddings, and anomaly detection algorithm to analyze the data generated by mass spectrometry (MS)-based metabolomics called GEMNA (Graph Embedding-based Metabolomics Network Analysis), for example for an untargeted volatile study on Mentos candy, the data clusters produced by GEMNA were better than the ones used traditional tools, i.e., GEMNA has [Formula: see text], vs. the traditional approach has [Formula: see text].

摘要

基于质谱(MS)的代谢组学分析是一种强大的工具,但它也有自己的一系列挑战。MS 工作流程在其被称为数据挖掘的解释之前涉及多个步骤。数据挖掘由两个步骤组成。首先,MS 数据被排序、整理并呈现出来,以便在进行分析之前进行过滤。其次,使用统计学方法对过滤和减少的数据进行分析,以去除进一步的可变性。这在基于 MS 的非靶向代谢组学研究中尤其如此,这些研究侧重于理解代谢网络中的倍数变化。由于从大型数据集筛选和识别变化的任务具有挑战性,因此需要用于挖掘非靶向基于 MS 的代谢组学数据的自动化技术。基于传统统计学的方法往往会过度过滤原始数据,这可能导致相关数据的丢失,并导致更少的代谢组变化被识别。这种传统方法的局限性突出了需要一种新的方法。在这项工作中,我们提出了一种使用节点嵌入(由 GNN 提供支持)、边嵌入和异常检测算法的新的深度学习方法来分析基于 MS 的代谢组学产生的数据,例如用于对 Mentos 糖果进行非靶向挥发性研究,GEMNA 生成的数据聚类比传统工具产生的聚类更好,即 GEMNA 有 [Formula: see text],而传统方法只有 [Formula: see text]。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0d7/11604959/42e302f48c57/41598_2024_80955_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验