Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States.
Department of Chemistry and Biochemistry and the Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, United States.
J Am Soc Mass Spectrom. 2023 Dec 6;34(12):2775-2784. doi: 10.1021/jasms.3c00295. Epub 2023 Oct 28.
To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a "DIY" aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.
为了获得高质量的组学结果,必须充分解决质谱(MS)数据中的系统变异性。有效的数据归一化对于最小化这种变异性至关重要。由于归一化方法的多样性和数据依赖性,一些研究人员开发了用于选择最佳方法的开源学术软件。虽然这些工具对社区肯定是有益的,但它们都不能满足所有用户的所有需求,特别是那些希望测试新产品中没有的新策略的用户。在这里,我们提出了一个简单而直接的工作流程,使用简单的评估指标,通过有监督和无监督机器学习,方便地确定最佳归一化策略。该工作流程提供了一个“DIY”方面,任何归一化策略的性能都可以针对任何类型的 MS 数据进行评估。作为其效用的演示,我们将此工作流程应用于两个不同的数据集,一个是来自潜伏指纹的提取脂质的 ESI-MS 数据集,另一个是由 MALDI-MSI 电离的代谢物的癌症球体数据集,我们确定了表现最佳的归一化策略。