Bao Han, Zhang Xiuqiong, Wang Xinxin, Zhao Jinhui, Zhao Xinjie, Zhao Chunxia, Lu Xin, Xu Guowang
State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China.
University of Chinese Academy of Sciences, Beijing 100049, P. R. China.
Anal Chem. 2025 Jul 15;97(27):14200-14209. doi: 10.1021/acs.analchem.4c06875. Epub 2025 Jun 30.
MS/MS-based untargeted metabolomics generates complex data, but pathway enrichment analysis is constrained by the low annotation rates of metabolic features. Here, we propose MS2MP, a novel deep learning-based framework for KEGG pathway prediction directly from untargeted tandem mass spectrometry (MS), eliminating the need for prior metabolite annotation. MS2MP utilizes a graph neural network architecture to learn the complex relationships between spectral features and metabolic pathways, representing MS spectra as fragmentation tree graphs. Trained on 33,221 experimental MS spectra, MS2MP achieves robust predictive performance with a balanced accuracy of 94.1% in cross-validation and 87.8%-91.2% on three independent test sets. Notably, MS2MP achieves an "exact match" for 97-98 out of 161 tested metabolite standards across diverse experimental conditions, underscoring its reliability and adaptability. Subsequently, a novel MS-based pathway enrichment method was developed. The established methods were applied to identify significantly perturbed pathways in transgenic maize. The results uncovered disruptions in phenylpropanoid biosynthesis and related downstream pathways, including those involved in amino acid and secondary metabolite metabolism, which were overlooked by the conventional annotation-based enrichment analysis method. To the best of our knowledge, MS2MP is the first computational tool capable of directly predicting metabolic pathways from MS spectra. By linking MS-based untargeted metabolomics data to metabolic pathways, MS2MP enables more efficient pathway enrichment analysis, thereby accelerating biological discoveries and enhancing our understanding of complex metabolic networks.
基于串联质谱(MS/MS)的非靶向代谢组学产生复杂的数据,但通路富集分析受到代谢特征低注释率的限制。在此,我们提出了MS2MP,这是一种基于深度学习的新型框架,可直接从非靶向串联质谱(MS)预测KEGG通路,无需事先进行代谢物注释。MS2MP利用图神经网络架构来学习光谱特征与代谢通路之间的复杂关系,将质谱图表示为碎片树形图。在33,221个实验质谱图上进行训练后,MS2MP在交叉验证中实现了稳健的预测性能,平衡准确率为94.1%,在三个独立测试集上的准确率为87.8%-91.2%。值得注意的是,在不同实验条件下,MS2MP对161个测试代谢物标准品中的97-98个实现了“精确匹配”,突出了其可靠性和适应性。随后,开发了一种基于质谱的新型通路富集方法。将已建立的方法应用于鉴定转基因玉米中显著受干扰的通路。结果揭示了苯丙烷生物合成及相关下游通路的破坏,包括那些参与氨基酸和次生代谢物代谢的通路,而传统的基于注释的富集分析方法忽略了这些通路。据我们所知,MS2MP是首个能够直接从质谱图预测代谢通路的计算工具。通过将基于质谱的非靶向代谢组学数据与代谢通路相联系,MS2MP实现了更高效的通路富集分析,从而加速生物学发现并增进我们对复杂代谢网络的理解。