Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands.
Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Department of Biochemistry, Nijmegen Centre for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands.
Anal Chim Acta. 2015 Oct 29;899:1-12. doi: 10.1016/j.aca.2015.06.042. Epub 2015 Aug 7.
Many advanced metabolomics experiments currently lead to data where a large number of response variables were measured while one or several factors were changed. Often the number of response variables vastly exceeds the sample size and well-established techniques such as multivariate analysis of variance (MANOVA) cannot be used to analyze the data. ANOVA simultaneous component analysis (ASCA) is an alternative to MANOVA for analysis of metabolomics data from an experimental design. In this paper, we show that ASCA assumes that none of the metabolites are correlated and that they all have the same variance. Because of these assumptions, ASCA may relate the wrong variables to a factor. This reduces the power of the method and hampers interpretation. We propose an improved model that is essentially a weighted average of the ASCA and MANOVA models. The optimal weight is determined in a data-driven fashion. Compared to ASCA, this method assumes that variables can correlate, leading to a more realistic view of the data. Compared to MANOVA, the model is also applicable when the number of samples is (much) smaller than the number of variables. These advantages are demonstrated by means of simulated and real data examples. The source code of the method is available from the first author upon request, and at the following github repository: https://github.com/JasperE/regularized-MANOVA.
许多先进的代谢组学实验目前导致的数据中,大量的响应变量被测量,而一个或几个因素发生了变化。通常情况下,响应变量的数量远远超过样本量,并且无法使用多元方差分析(MANOVA)等成熟技术来分析数据。ASCA 是 MANOVA 的替代方法,可用于分析实验设计的代谢组学数据。在本文中,我们表明 ASCA 假设没有代谢物相互关联,并且它们具有相同的方差。由于这些假设,ASCA 可能会将错误的变量与因素相关联。这降低了方法的功效,并阻碍了解释。我们提出了一种改进的模型,本质上是 ASCA 和 MANOVA 模型的加权平均值。最优权重是通过数据驱动的方式确定的。与 ASCA 相比,该方法假设变量可以相关,从而对数据有更现实的看法。与 MANOVA 相比,该模型在样本数量(远)小于变量数量时也适用。通过模拟和真实数据示例证明了这些优势。该方法的源代码可向第一作者索取,并可在以下 github 存储库中获得:https://github.com/JasperE/regularized-MANOVA。