Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Anal Chem. 2024 Oct 22;96(42):16871-16881. doi: 10.1021/acs.analchem.4c03724. Epub 2024 Oct 14.
A pivotal challenge in metabolite research is the structural annotation of metabolites from tandem mass spectrometry (MS/MS) data. The integration of artificial intelligence (AI) has revolutionized the interpretation of MS data, facilitating the identification of elusive metabolites within the metabolomics landscape. Innovative methodologies are primarily focusing on transforming MS/MS spectra or molecular structures into a unified modality to enable similarity-based comparison and interpretation. In this work, we present CMSSP, a novel Contrastive Mass Spectra-Structure Pretraining framework designed for metabolite annotation. The primary objective of CMSSP is to establish a representation space that facilitates a direct comparison between MS/MS spectra and molecular structures, transcending the limitations of distinct modalities. The evaluation on two benchmark test sets demonstrates the efficacy of the approach. CMSSP achieved a remarkable enhancement in annotation accuracy, outperforming the state-of-the-art methods by a significant margin. Specifically, it improved the top-1 accuracy by 30% on the CASMI 2017 data set and realized a 16% increase in top-10 accuracy on an independent test set. Moreover, the model displayed superior identification accuracy across all seven chemical categories, showcasing its robustness and versatility. Finally, the MS/MS data of 30 metabolites from were analyzed, achieving top-1 and top-3 accuracies of 86.7 and 100%, respectively. The CMSSP model serves as a potent tool for the dissection and interpretation of intricate MS/MS data, propelling the field toward more accurate and efficient metabolite annotation. This not only augments the analytical capabilities of metabolomics but also paves the way for future discoveries in understanding of complex biological systems.
代谢物研究中的一个关键挑战是从串联质谱(MS/MS)数据中对代谢物进行结构注释。人工智能(AI)的融合彻底改变了 MS 数据的解释,有助于在代谢组学领域中识别难以捉摸的代谢物。创新的方法主要集中在将 MS/MS 光谱或分子结构转化为统一的模态,以实现基于相似性的比较和解释。在这项工作中,我们提出了 CMSSP,这是一种用于代谢物注释的新型对比质谱-结构预训练框架。CMSSP 的主要目标是建立一个表示空间,使 MS/MS 光谱和分子结构之间能够直接进行比较,从而克服不同模态的局限性。在两个基准测试集上的评估证明了该方法的有效性。CMSSP 在注释准确性方面取得了显著提高,显著优于最先进的方法。具体来说,它将 CASMI 2017 数据集上的 top-1 准确率提高了 30%,在独立测试集上的 top-10 准确率提高了 16%。此外,该模型在所有七个化学类别中都表现出了卓越的识别准确性,展示了其稳健性和多功能性。最后,分析了 30 种代谢物的 MS/MS 数据,分别达到了 86.7%和 100%的 top-1 和 top-3 准确率。CMSSP 模型是解析和解释复杂 MS/MS 数据的有力工具,推动了代谢物注释更加准确和高效的发展。这不仅增强了代谢组学的分析能力,也为理解复杂生物系统的未来发现铺平了道路。