Vinaixa Maria, Samino Sara, Saez Isabel, Duran Jordi, Guinovart Joan J, Yanes Oscar
Metabolomics Platform, Campus Sescelades, Edifici N2, Rovira i Virgili University, Tarragona 43007, Spain.
Institute for Research in Biomedicine (IRB Barcelona), Barcelona 08028, Spain.
Metabolites. 2012 Oct 18;2(4):775-95. doi: 10.3390/metabo2040775.
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
几种代谢组学软件程序提供了基于液相色谱-质谱联用(LC/MS)的代谢组学中峰提取、保留时间校准以及代谢物特征定量的方法。然而,为了发现样本之间显著改变的那些特征,还需要进行统计分析。通过将模型化合物的保留时间和串联质谱(MS/MS)数据与研究样本中感兴趣的改变特征的数据进行比较,进而可以明确鉴定代谢物。本文全面概述了一种统计分析工作流程,用于对将被选用于进一步MS/MS实验的相关代谢物特征进行排序。我们专注于对所有检测到的特征并行应用的单变量数据分析。使用四个不同的真实LC/MS非靶向代谢组学数据集讨论并说明了这种分析的特点和挑战。我们使用高维LC/MS数据集展示了考虑或违背单变量统计检验所依赖的数学假设的影响。在我们的四个非靶向LC/MS工作示例的背景下,讨论并说明了数据分析中的问题,如样本量的确定、分析变异、正态性和同方差性假设或多重检验校正。