Suppr超能文献

解决基于质谱的代谢组学中的大数据挑战。

Addressing big data challenges in mass spectrometry-based metabolomics.

机构信息

Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.

出版信息

Chem Commun (Camb). 2022 Sep 8;58(72):9979-9990. doi: 10.1039/d2cc03598g.

Abstract

Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size. Therefore, in the "omics" era, we are faced with new challenges, the big data challenges of how to accurately and efficiently process the raw data, extract the biological information, and visualize the results from the gigantic amount of collected data. Although important, proposing solutions to address these big data challenges requires broad interdisciplinary knowledge, which can be challenging for many metabolomics practitioners. Our laboratory in the Department of Chemistry at the University of British Columbia is committed to combining analytical chemistry, computer science, and statistics to develop bioinformatics tools that address these big data challenges. In this Feature Article, we elaborate on the major big data challenges in metabolomics, including data acquisition, feature extraction, quantitative measurements, statistical analysis, and metabolite annotation. We also introduce our recently developed bioinformatics solutions for these challenges. Notably, all of the bioinformatics tools and source codes are freely available on GitHub (https://www.github.com/HuanLab), along with revised and regularly updated content.

摘要

计算机科学和软件工程的进步极大地促进了基于质谱(MS)的非靶向代谢组学。如今,从 MS 平台上通常可以生成千兆字节的代谢组学数据,其中包含来自数千种代谢物的浓缩结构和定量信息。由于数据量庞大,手动数据处理几乎是不可能的。因此,在“组学”时代,我们面临着新的挑战,即如何准确有效地处理原始数据、提取生物信息以及从大量采集的数据中可视化结果的大数据挑战。虽然提出解决这些大数据挑战的解决方案很重要,但这需要广泛的跨学科知识,这对于许多代谢组学从业者来说可能具有挑战性。我们在不列颠哥伦比亚大学化学系的实验室致力于将分析化学、计算机科学和统计学结合起来,开发生物信息学工具来应对这些大数据挑战。在这篇专题文章中,我们详细阐述了代谢组学中的主要大数据挑战,包括数据采集、特征提取、定量测量、统计分析和代谢物注释。我们还介绍了我们最近针对这些挑战开发的生物信息学解决方案。值得注意的是,所有的生物信息学工具和源代码都可以在 GitHub(https://www.github.com/HuanLab)上免费获取,并且内容经过修订和定期更新。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验