• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于串联质谱(MS/MS)的小分子结构解析的当前及未来深度学习算法

Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation.

作者信息

Liu Youzhong, De Vijlder Thomas, Bittremieux Wout, Laukens Kris, Heyndrickx Wouter

机构信息

Janssen Research & Development, Beerse, Belgium.

University of Antwerp, Antwerp, Belgium.

出版信息

Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1:e9120. doi: 10.1002/rcm.9120. Epub 2021 May 26.

DOI:10.1002/rcm.9120
PMID:33955607
Abstract

RATIONALE

Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation.

ARCHITECTURES

Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search.

CONCLUSIONS

In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.

摘要

原理

几十年来,小分子结构解析一直是质谱分析的核心应用之一。尽管软件工具越来越多,但从串联质谱(MS/MS)数据中解析结构仍然是一项具有挑战性的任务,许多光谱无法识别。然而,随着越来越多的参考MS/MS光谱在存储库规模上进行整理并在公共服务器上共享,开发强大的新型深度学习(DL)模型以实现自动结构解析的机会令人兴奋。

架构

最近的早期DL框架大多遵循“两步法”,即在首先预测分子描述符后,将MS/MS光谱转换为数据库结构。相关架构可能存在以下问题:(1)由于特定描述符分类器的单独训练导致计算复杂度高;(2)质谱数据的高维性质以及数据预处理导致的信息丢失;(3)预定义分子指纹的子结构覆盖率低和类不平衡问题。受药物发现领域成功的DL框架启发,我们构思并设计了假设的DL架构来解决上述问题。对于(1),我们建议采用多任务学习,通过对结构相关的描述符进行分组,用更少的分类器实现更好的性能。对于(2)和(3),我们引入特征工程,从光谱和结构数据中提取浓缩的高阶信息。例如,用子树和预先计算的光谱模式对光谱进行编码,可将峰间相互作用添加到模型输入中。用图卷积网络对结构进行编码,可纳入分子内的连接性。光谱和结构的联合嵌入可实现同时进行光谱库和分子数据库搜索。

结论

原则上,在有足够的训练数据、适配的DL架构、最优超参数和计算能力的情况下,DL框架可以从MS/MS光谱中完全或至少部分地预测小分子结构。然而,应根据经典机器学习框架对其性能和普遍适用性进行公正评估。

相似文献

1
Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation.基于串联质谱(MS/MS)的小分子结构解析的当前及未来深度学习算法
Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1:e9120. doi: 10.1002/rcm.9120. Epub 2021 May 26.
2
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.
3
IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra.IDSL_MINT:一种用于从质谱预测分子指纹的深度学习框架。
J Cheminform. 2024 Jan 18;16(1):8. doi: 10.1186/s13321-024-00804-5.
4
TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry.TransExION:一种基于变压器的可解释相似性度量,用于比较串联质谱中的离子。
J Cheminform. 2024 May 28;16(1):61. doi: 10.1186/s13321-024-00858-5.
5
MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.MassGenie:一种基于 Transformer 的深度学习方法,用于从其质谱中识别小分子。
Biomolecules. 2021 Nov 30;11(12):1793. doi: 10.3390/biom11121793.
6
The spectral networks paradigm in high throughput mass spectrometry.高通量质谱中的光谱网络范式
Mol Biosyst. 2012 Oct;8(10):2535-44. doi: 10.1039/c2mb25085c.
7
Machine Learning in Small-Molecule Mass Spectrometry.
Annu Rev Anal Chem (Palo Alto Calif). 2025 May;18(1):193-215. doi: 10.1146/annurev-anchem-071224-082157. Epub 2025 Feb 27.
8
Machine learning for identification of silylated derivatives from mass spectra.用于从质谱图中识别硅烷化衍生物的机器学习
J Cheminform. 2022 Sep 15;14(1):62. doi: 10.1186/s13321-022-00636-1.
9
MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.MS2CNN:基于深度卷积神经网络的蛋白质序列预测 MS/MS 谱。
BMC Genomics. 2019 Dec 24;20(Suppl 9):906. doi: 10.1186/s12864-019-6297-6.
10
MS2Analyzer: A software for small molecule substructure annotations from accurate tandem mass spectra.MS2Analyzer:一款用于从精确串联质谱中进行小分子亚结构注释的软件。
Anal Chem. 2014 Nov 4;86(21):10724-31. doi: 10.1021/ac502818e. Epub 2014 Oct 14.

引用本文的文献

1
The Future of a Myriad of Accelerated Biodiscoveries Lies in AI-Powered Mass Spectrometry and Multiomics Integration.众多加速生物发现的未来在于人工智能驱动的质谱分析和多组学整合。
J Mass Spectrom. 2025 Aug;60(8):e5157. doi: 10.1002/jms.5157.
2
Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening.基于机器学习的非靶向筛查中具有结构警示的潜在危险化学品优先级分类
Environ Sci Technol. 2025 Mar 18;59(10):5056-5065. doi: 10.1021/acs.est.4c10498. Epub 2025 Mar 7.
3
ProteoNet: A CNN-based framework for analyzing proteomics MS-RGB images.
ProteoNet:一种基于卷积神经网络的蛋白质组学MS-RGB图像分析框架。
iScience. 2024 Nov 12;27(12):111362. doi: 10.1016/j.isci.2024.111362. eCollection 2024 Dec 20.
4
Recent Developments in Machine Learning for Mass Spectrometry.用于质谱分析的机器学习的最新进展
ACS Meas Sci Au. 2024 Feb 21;4(3):233-246. doi: 10.1021/acsmeasuresciau.3c00060. eCollection 2024 Jun 19.
5
TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry.TransExION:一种基于变压器的可解释相似性度量,用于比较串联质谱中的离子。
J Cheminform. 2024 May 28;16(1):61. doi: 10.1186/s13321-024-00858-5.
6
A Light Touch: Solar Photocatalysis Detoxifies Oil Sands Process-Affected Waters Prior to Significant Treatment of Naphthenic Acids.轻触:在对环烷酸进行重大处理之前,太阳能光催化可对受油砂加工影响的水进行解毒。
ACS ES T Water. 2024 Feb 23;4(4):1483-1497. doi: 10.1021/acsestwater.3c00616. eCollection 2024 Apr 12.
7
Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools.使用和基准测试计算代谢组学生物标志物注释工具的良好实践和建议。
Metabolomics. 2022 Dec 5;18(12):103. doi: 10.1007/s11306-022-01963-y.
8
The critical role that spectral libraries play in capturing the metabolomics community knowledge.光谱库在捕获代谢组学领域知识方面的关键作用。
Metabolomics. 2022 Nov 19;18(12):94. doi: 10.1007/s11306-022-01947-y.
9
AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications.人工智能/机器学习驱动的非靶向代谢组学和暴露组学在生物医学应用中的进展。
Cell Rep Phys Sci. 2022 Jul 20;3(7). doi: 10.1016/j.xcrp.2022.100978.
10
MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.MassGenie:一种基于 Transformer 的深度学习方法,用于从其质谱中识别小分子。
Biomolecules. 2021 Nov 30;11(12):1793. doi: 10.3390/biom11121793.