Department of Analytical Chemistry, Institute of Chemistry, University of Campinas, Campinas, SP, Brazil.
Analytical Chemistry Department, University of Campinas, Institute of Chemistry, Campinas, São Paulo, Brazil.
Adv Exp Med Biol. 2021;1336:243-264. doi: 10.1007/978-3-030-77252-9_12.
The present chapter describes basic aspects of the main steps for data processing on mass spectrometry-based metabolomics platforms, focusing on the main objectives and important considerations of each step. Initially, an overview of metabolomics and the pivotal techniques applied in the field are presented. Important features of data acquisition and preprocessing such as data compression, noise filtering, and baseline correction are revised focusing on practical aspects. Peak detection, deconvolution, and alignment as well as missing values are also discussed. Special attention is given to chemical and mathematical normalization approaches and the role of the quality control (QC) samples. Methods for uni- and multivariate statistical analysis and data pretreatment that could impact them are reviewed, emphasizing the most widely used multivariate methods, i.e., principal components analysis (PCA), partial least squares-discriminant analysis (PLS-DA), orthogonal partial least square-discriminant analysis (OPLS-DA), and hierarchical cluster analysis (HCA). Criteria for model validation and softwares used in data processing were also approached. The chapter ends with some concerns about the minimal requirements to report metadata in metabolomics.
本章描述了基于质谱的代谢组学平台上数据处理的主要步骤的基本方面,重点介绍了每个步骤的主要目标和重要考虑因素。首先,介绍了代谢组学和该领域应用的关键技术。重点讨论了数据获取和预处理的重要特征,如数据压缩、噪声过滤和基线校正,侧重于实际方面。还讨论了峰检测、去卷积和对齐以及缺失值。特别关注化学和数学归一化方法以及质量控制 (QC) 样品的作用。综述了用于单变量和多变量统计分析以及可能影响它们的数据预处理方法,强调了最广泛使用的多变量方法,即主成分分析 (PCA)、偏最小二乘判别分析 (PLS-DA)、正交偏最小二乘判别分析 (OPLS-DA) 和层次聚类分析 (HCA)。还讨论了模型验证的标准和数据处理中使用的软件。本章最后讨论了代谢组学中报告元数据的最低要求的一些关注点。