Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, 68583-0963, USA.
Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588-0304, USA.
Metabolomics. 2018 Aug 10;14(8):108. doi: 10.1007/s11306-018-1400-6.
Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively reduce unwanted biases and increase the accuracy of downstream quantitative analyses. But, it is currently unclear which normalization method is best since each algorithm addresses systematic noise in different ways.
Determine an optimal choice of a normalization method for the preprocessing of metabolomics datasets.
Nine MVAPACK normalization algorithms were compared with simulated and experimental NMR spectra modified with added Gaussian noise and random dilution factors. Methods were evaluated based on an ability to recover the intensities of the true spectral peaks and the reproducibility of true classifying features from orthogonal projections to latent structures-discriminant analysis model (OPLS-DA).
Most normalization methods (except histogram matching) performed equally well at modest levels of signal variance. Only probabilistic quotient (PQ) and constant sum (CS) maintained the highest level of peak recovery (> 67%) and correlation with true loadings (> 0.6) at maximal noise.
PQ and CS performed the best at recovering peak intensities and reproducing the true classifying features for an OPLS-DA model regardless of spectral noise level. Our findings suggest that performance is largely determined by the level of noise in the dataset, while the effect of dilution factors was negligible. A minimal allowable noise level of 20% was also identified for a valid NMR metabolomics dataset.
如果不能正确地解释 OMICS 数据集的正常系统变化,可能会导致误导性的生物学结论。因此,在 OMICS 数据集的正确预处理中,归一化是必要的步骤。在这方面,最优的归一化方法将有效地减少不必要的偏差,并提高下游定量分析的准确性。但是,目前还不清楚哪种归一化方法是最好的,因为每种算法都以不同的方式解决系统噪声问题。
确定代谢组学数据集预处理的最优归一化方法选择。
比较了 9 种 MVAPACK 归一化算法与添加高斯噪声和随机稀释因子修改的模拟和实验 NMR 光谱。方法的评估基于以下能力:恢复真实谱峰的强度和正交投影到潜在结构判别分析模型(OPLS-DA)的真实分类特征的可重复性。
大多数归一化方法(除了直方图匹配)在信号方差适度的情况下表现相当。只有概率商(PQ)和常数和(CS)在最大噪声下保持了最高的峰恢复水平(>67%)和与真实载荷的相关性(>0.6)。
PQ 和 CS 在恢复峰强度和再现 OPLS-DA 模型的真实分类特征方面表现最好,无论光谱噪声水平如何。我们的发现表明,性能主要取决于数据集的噪声水平,而稀释因子的影响可以忽略不计。还确定了 20%的最小允许噪声水平,以确保 NMR 代谢组学数据集的有效性。