Suppr超能文献

基于串联质谱(MS/MS)的小分子结构解析的当前及未来深度学习算法

Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation.

作者信息

Liu Youzhong, De Vijlder Thomas, Bittremieux Wout, Laukens Kris, Heyndrickx Wouter

机构信息

Janssen Research & Development, Beerse, Belgium.

University of Antwerp, Antwerp, Belgium.

出版信息

Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1:e9120. doi: 10.1002/rcm.9120. Epub 2021 May 26.

Abstract

RATIONALE

Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation.

ARCHITECTURES

Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search.

CONCLUSIONS

In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.

摘要

原理

几十年来,小分子结构解析一直是质谱分析的核心应用之一。尽管软件工具越来越多,但从串联质谱(MS/MS)数据中解析结构仍然是一项具有挑战性的任务,许多光谱无法识别。然而,随着越来越多的参考MS/MS光谱在存储库规模上进行整理并在公共服务器上共享,开发强大的新型深度学习(DL)模型以实现自动结构解析的机会令人兴奋。

架构

最近的早期DL框架大多遵循“两步法”,即在首先预测分子描述符后,将MS/MS光谱转换为数据库结构。相关架构可能存在以下问题:(1)由于特定描述符分类器的单独训练导致计算复杂度高;(2)质谱数据的高维性质以及数据预处理导致的信息丢失;(3)预定义分子指纹的子结构覆盖率低和类不平衡问题。受药物发现领域成功的DL框架启发,我们构思并设计了假设的DL架构来解决上述问题。对于(1),我们建议采用多任务学习,通过对结构相关的描述符进行分组,用更少的分类器实现更好的性能。对于(2)和(3),我们引入特征工程,从光谱和结构数据中提取浓缩的高阶信息。例如,用子树和预先计算的光谱模式对光谱进行编码,可将峰间相互作用添加到模型输入中。用图卷积网络对结构进行编码,可纳入分子内的连接性。光谱和结构的联合嵌入可实现同时进行光谱库和分子数据库搜索。

结论

原则上,在有足够的训练数据、适配的DL架构、最优超参数和计算能力的情况下,DL框架可以从MS/MS光谱中完全或至少部分地预测小分子结构。然而,应根据经典机器学习框架对其性能和普遍适用性进行公正评估。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验