Suppr超能文献

从结构上预测质谱的能力如何?在未经训练的串联质谱上对代谢物鉴定进行竞争碎片建模的基准测试。

How Well Can We Predict Mass Spectra from Structures? Benchmarking Competitive Fragmentation Modeling for Metabolite Identification on Untrained Tandem Mass Spectra.

机构信息

Department of Chemistry, University of California Davis, Davis, California 95616, United States.

West Coast Metabolomics Center for Compound Identification, UC Davis Genome Center, University of California Davis, Davis, California 95616, United States.

出版信息

J Chem Inf Model. 2022 Sep 12;62(17):4049-4056. doi: 10.1021/acs.jcim.2c00936. Epub 2022 Aug 31.

Abstract

Competitive Fragmentation Modeling for Metabolite Identification (CFM-ID) is a machine learning tool to predict in silico tandem mass spectra (MS/MS) for known or suspected metabolites for which chemical reference standards are not available. As a machine learning tool, it relies on both an underlying statistical model and an explicit training set that encompasses experimental mass spectra for specific compounds. Such mass spectra depend on specific parameters such as collision energies, instrument types, and adducts which are accumulated in libraries. Yet, ultimately prediction tools that are meant to cover wide expanses of entities must be validated on cases that were not included in the initial training and testing sets. Hence, we here benchmarked the performance of CFM-ID 4.0 to correctly predict MS/MS spectra for spectra that were not included in the CFM-ID training set and for different mass spectrometry conditions. We used 609,456 experimental tandem spectra from the NIST20 mass spectral library that were newly added to the previous NIST17 library version. We found that CFM-ID's highest energy prediction output would maximize the capacity for library generation. Matching the experimental collision energy with CFM-ID's prediction energy produced the best results, even for HCD-Orbitrap instruments. For benzenoids, better MS/MS predictions were achieved than for heterocyclic compounds. However, when exploring CFM-ID's performance on 8,305 compounds at 40 eV HCD-Orbitrap collision energy, >90% of the 20/80 split test compounds showed <700 MS/MS similarity score. Instead of a stand-alone tool, CFM-ID 4.0 might be useful to boost candidate structures in the greater context of identification workflows.

摘要

竞争碎片化建模用于代谢物鉴定 (CFM-ID) 是一种机器学习工具,用于预测化学参考标准不可用的已知或可疑代谢物的计算串联质谱 (MS/MS)。作为一种机器学习工具,它既依赖于底层统计模型,也依赖于明确的训练集,该训练集包含特定化合物的实验质谱。这些质谱取决于特定参数,例如碰撞能量、仪器类型和在库中积累的加合物。然而,最终旨在涵盖广泛实体的预测工具必须在初始训练和测试集中未包含的情况下进行验证。因此,我们在此对 CFM-ID 4.0 的性能进行了基准测试,以正确预测未包含在 CFM-ID 训练集中的光谱和不同质谱条件下的 MS/MS 光谱。我们使用了来自 NIST20 质谱库的 609,456 个实验串联光谱,这些光谱是在前 NIST17 库版本的基础上新添加的。我们发现,CFM-ID 的最高能量预测输出将最大限度地提高库生成的能力。将实验碰撞能量与 CFM-ID 的预测能量匹配可产生最佳结果,即使对于 HCD-Orbitrap 仪器也是如此。对于苯类化合物,MS/MS 预测的结果要好于杂环化合物。然而,当在 40 eV HCD-Orbitrap 碰撞能量下探索 CFM-ID 在 8,305 种化合物上的性能时,>90%的 20/80 分割测试化合物的 MS/MS 相似性评分<700。CFM-ID 4.0 不是一个独立的工具,它可能有助于在更大的鉴定工作流程背景下提升候选结构。

相似文献

引用本文的文献

4
BitterMasS: Predicting Bitterness from Mass Spectra.苦味预测:基于质谱的苦味预测。
J Agric Food Chem. 2024 May 8;72(18):10537-10547. doi: 10.1021/acs.jafc.3c09767. Epub 2024 Apr 30.
6
Recent advances in mass spectrometry-based computational metabolomics.基于质谱的计算代谢组学的最新进展。
Curr Opin Chem Biol. 2023 Jun;74:102288. doi: 10.1016/j.cbpa.2023.102288. Epub 2023 Mar 24.

本文引用的文献

2
Quantum Chemistry Calculations for Metabolomics.代谢组学的量子化学计算。
Chem Rev. 2021 May 26;121(10):5633-5670. doi: 10.1021/acs.chemrev.0c00901. Epub 2021 May 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验