Department of Chemistry and Biochemistry, University of California, San Diego, CA, 92093, USA.
Department of Chemistry and Biochemistry, Miami University, Oxford, OH, 45056, USA.
Metabolomics. 2023 Jun 28;19(7):64. doi: 10.1007/s11306-023-02027-5.
Interpretation and analysis of NMR-based metabolic profiling studies is limited by substantially incomplete commercial and academic databases. Statistical significance tests, including p-values, VIP scores, AUC values and FC values, can be largely inconsistent. Data normalization prior to statistical analysis can cause erroneous outcomes.
The objectives were (1) to quantitatively assess consistency among p-values, VIP scores, AUC values and FC values in representative NMR-based metabolic profiling datasets, (2) to assess how data normalization can impact statistical significance outcomes, (3) to determine resonance peak assignment completion potential using commonly used databases and (4) to analyze intersection and uniqueness of metabolite space in these databases.
P-values, VIP scores, AUC values and FC values, and their dependence on data normalization, were determined in orthotopic mouse model of pancreatic cancer and two human pancreatic cancer cell lines. Completeness of resonance assignments were evaluated using Chenomx, the human metabolite database (HMDB) and the COLMAR database. The intersection and uniqueness of the databases was quantified.
P-values and AUC values were strongly correlated compared to VIP or FC values. Distributions of statistically significant bins depended strongly on whether or not datasets were normalized. 40-45% of peaks had either no or ambiguous database matches. 9-22% of metabolites were unique to each database.
Lack of consistency in statistical analyses of metabolomics data can lead to misleading or inconsistent interpretation. Data normalization can have large effects on statistical analysis and should be justified. About 40% of peak assignments remain ambiguous or impossible with current databases. 1D and 2D databases should be made consistent to maximize metabolite assignment confidence and validation.
基于 NMR 的代谢组学研究的解释和分析受到商业和学术数据库极不完整的限制。统计显著性检验,包括 p 值、VIP 得分、AUC 值和 FC 值,可能存在很大的不一致。在进行统计分析之前进行数据归一化可能会导致错误的结果。
本研究的目的是(1)定量评估代表性基于 NMR 的代谢组学研究中 p 值、VIP 得分、AUC 值和 FC 值之间的一致性,(2)评估数据归一化如何影响统计显著性结果,(3)使用常用数据库确定共振峰分配完成的可能性,以及(4)分析这些数据库中代谢物空间的交点和独特性。
在胰腺癌的原位小鼠模型和两种人胰腺癌细胞系中,确定了 p 值、VIP 得分、AUC 值和 FC 值,以及它们对数据归一化的依赖性。使用 Chenomx、人类代谢物数据库 (HMDB) 和 COLMAR 数据库评估共振分配的完整性。量化了数据库的交点和独特性。
与 VIP 或 FC 值相比,p 值和 AUC 值具有很强的相关性。统计学上显著的箱分布强烈依赖于数据集是否归一化。40-45%的峰要么没有数据库匹配,要么没有明确的匹配。9-22%的代谢物是每个数据库独有的。
代谢组学数据分析中统计分析的不一致性可能导致解释误导或不一致。数据归一化对统计分析有很大影响,应该加以证明。目前的数据库中约有 40%的峰分配仍然模糊或不可能。1D 和 2D 数据库应保持一致,以最大限度地提高代谢物分配的置信度和验证。