Smidt Heart Institute , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States.
Anal Chem. 2019 Oct 1;91(19):12407-12413. doi: 10.1021/acs.analchem.9b02983. Epub 2019 Sep 19.
Liquid chromatography-mass spectrometry (LC-MS)-based metabolomics has emerged as a valuable tool for biological discovery, capable of assaying thousands of diverse chemical entities in a single biospecimen. Processing of nontargeted LC-MS spectral data requires identification and isolation of true spectral features from the random, false noise peaks that comprise a significant portion of total signals, using inexact peak selection algorithms and time-consuming visual inspection of data. To increase the fidelity and speed of data processing, herein we establish, optimize, and evaluate a machine learning pipeline employing deep neural networks as well as a simpler multiple logistic regression model for classification of spectral features from nontargeted LC-MS metabolomics data. Machine learning-based approaches were found to remove up to 90% of false peaks from complex nontargeted LC-MS data sets without reducing true positive signals and exhibit excellent reproducibility across multiple data sets. Application of machine learning for nontargeted LC-MS-based peak selection provides for robust and scalable peak classification and data filtering, enabling handling and processing of large scale, complex metabolomics data sets.
基于液相色谱-质谱联用(LC-MS)的代谢组学已成为生物发现的一种有价值的工具,能够在单个生物样本中测定数千种不同的化学物质。非靶向 LC-MS 光谱数据的处理需要从构成总信号的随机、虚假噪声峰中识别和分离真实的光谱特征,使用不精确的峰选择算法和耗时的数据分析可视化检查。为了提高数据处理的准确性和速度,我们在此建立、优化和评估了一个机器学习管道,该管道采用深度神经网络以及更简单的多逻辑回归模型,用于分类非靶向 LC-MS 代谢组学数据中的光谱特征。基于机器学习的方法被发现可以从复杂的非靶向 LC-MS 数据集去除高达 90%的虚假峰,而不会减少真实的阳性信号,并在多个数据集之间表现出出色的重现性。将机器学习应用于非靶向 LC-MS 峰选择,可以实现强大且可扩展的峰分类和数据过滤,从而能够处理和处理大规模、复杂的代谢组学数据集。