School of Pharmacy, Shihezi University , Shihezi 832002, China.
Anal Chem. 2017 May 16;89(10):5342-5348. doi: 10.1021/acs.analchem.6b05152. Epub 2017 Apr 24.
Data reduction techniques in gas chromatography-mass spectrometry-based untargeted metabolomics has made the following workflow of data analysis more lucid. However, the normalization process still perplexes researchers, and its effects are always ignored. In order to reveal the influences of normalization method, five representative normalization methods (mass spectrometry total useful signal, median, probabilistic quotient normalization, remove unwanted variation-random, and systematic ratio normalization) were compared in three real data sets with different types. First, data reduction techniques were used to refine the original data. Then, quality control samples and relative log abundance plots were utilized to evaluate the unwanted variations and the efficiencies of normalization process. Furthermore, the potential biomarkers which were screened out by the Mann-Whitney U test, receiver operating characteristic curve analysis, random forest, and feature selection algorithm Boruta in different normalized data sets were compared. The results indicated the determination of the normalization method was difficult because the commonly accepted rules were easy to fulfill but different normalization methods had unforeseen influences on both the kind and number of potential biomarkers. Lastly, an integrated strategy for normalization method selection was recommended.
基于气相色谱-质谱联用的无靶向代谢组学中的数据缩减技术使得数据分析的以下工作流程更加清晰。然而,归一化过程仍然困扰着研究人员,其影响经常被忽视。为了揭示归一化方法的影响,在三个不同类型的真实数据集上比较了五种有代表性的归一化方法(质谱总有用信号、中位数、概率商归一化、去除不必要的变化-随机和系统比率归一化)。首先,使用数据缩减技术对原始数据进行细化。然后,利用质量控制样品和相对对数丰度图来评估不必要的变化和归一化过程的效率。此外,通过曼-惠特尼 U 检验、接收者操作特性曲线分析、随机森林和特征选择算法 Boruta 在不同归一化数据集筛选出的潜在生物标志物进行了比较。结果表明,由于通常接受的规则很容易满足,但不同的归一化方法对潜在生物标志物的种类和数量都有不可预见的影响,因此确定归一化方法是困难的。最后,建议采用一种综合的归一化方法选择策略。