• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于将质谱图翻译为从头合成分子的端到端深度学习框架。

An end-to-end deep learning framework for translating mass spectra to de-novo molecules.

作者信息

Litsa Eleni E, Chenthamarakshan Vijil, Das Payel, Kavraki Lydia E

机构信息

Department of Computer Science, Rice University, Houston, TX, USA.

IBM Research, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.

出版信息

Commun Chem. 2023 Jun 23;6(1):132. doi: 10.1038/s42004-023-00932-3.

DOI:10.1038/s42004-023-00932-3
PMID:37353554
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10290119/
Abstract

Elucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database.

摘要

阐明化合物的结构是化学中的一项基础任务,在药物发现、精准医学和生物标志物发现等多个领域都有应用。阐明化合物结构的常见做法是获取质谱,随后从光谱数据库中检索其结构。然而,对于参考数据库中不存在的新分子,这些方法就失效了。我们提出了Spec2Mol,这是一种仅根据质谱就能进行分子结构推荐的深度学习架构。Spec2Mol的灵感来源于将音频信号转换为文本的Speech2Text深度学习架构。我们的方法基于编码器-解码器架构。编码器学习光谱嵌入,而解码器在大量化学结构数据集上进行预训练,用于在不同分子表示之间进行转换,从而重建推荐化学结构的SMILES序列。我们通过评估推荐结构与原始结构之间的分子相似性来评估Spec2Mol。我们的分析表明,Spec2Mol能够从其质谱中识别关键分子子结构的存在,并且与现有的碎片树方法相比,表现相当,特别是在训练期间没有测试结构信息或参考数据库中不存在测试结构信息的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/c7fdd55e7531/42004_2023_932_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/bd3313a71f05/42004_2023_932_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/2316afc77f21/42004_2023_932_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/d04144f9a32c/42004_2023_932_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/c7fdd55e7531/42004_2023_932_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/bd3313a71f05/42004_2023_932_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/2316afc77f21/42004_2023_932_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/d04144f9a32c/42004_2023_932_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/855e/10290119/c7fdd55e7531/42004_2023_932_Fig4_HTML.jpg

相似文献

1
An end-to-end deep learning framework for translating mass spectra to de-novo molecules.一种用于将质谱图翻译为从头合成分子的端到端深度学习框架。
Commun Chem. 2023 Jun 23;6(1):132. doi: 10.1038/s42004-023-00932-3.
2
MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.MassGenie:一种基于 Transformer 的深度学习方法,用于从其质谱中识别小分子。
Biomolecules. 2021 Nov 30;11(12):1793. doi: 10.3390/biom11121793.
3
Improving Chemical Autoencoder Latent Space and Molecular Generation Diversity with Heteroencoders.用异构图编码器改进化学自动编码器潜在空间和分子生成多样性。
Biomolecules. 2018 Oct 30;8(4):131. doi: 10.3390/biom8040131.
4
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra.MS2DeepScore:一种用于比较串联质谱的新型深度学习相似性度量方法。
J Cheminform. 2021 Oct 29;13(1):84. doi: 10.1186/s13321-021-00558-4.
5
Investigation of chemical structure recognition by encoder-decoder models in learning progress.在学习过程中通过编码器-解码器模型进行化学结构识别的研究。
J Cheminform. 2023 Apr 12;15(1):45. doi: 10.1186/s13321-023-00713-z.
6
MSNovelist: de novo structure generation from mass spectra.MSNovelist:从头开始从质谱生成结构。
Nat Methods. 2022 Jul;19(7):865-870. doi: 10.1038/s41592-022-01486-3. Epub 2022 May 30.
7
De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子:从简化图到 SMILES 的转换。
J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.
8
SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。
J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.
9
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+:利用异构知识资源丰富人类表型本体的节点嵌入。
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.
10
Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records.基于生物医学语料库预训练的句子嵌入的深度学习提高了在电子病历中查找相似句子的性能。
BMC Med Inform Decis Mak. 2020 Apr 30;20(Suppl 1):73. doi: 10.1186/s12911-020-1044-0.

引用本文的文献

1
The Future of a Myriad of Accelerated Biodiscoveries Lies in AI-Powered Mass Spectrometry and Multiomics Integration.众多加速生物发现的未来在于人工智能驱动的质谱分析和多组学整合。
J Mass Spectrom. 2025 Aug;60(8):e5157. doi: 10.1002/jms.5157.
2
JESTR: Joint Embedding Space Technique for Ranking candidate molecules for the annotation of untargeted metabolomics data.JESTR:用于对非靶向代谢组学数据注释的候选分子进行排名的联合嵌入空间技术。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf354.
3
Molecular similarity: Theory, applications, and perspectives.

本文引用的文献

1
Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.Spec2Vec:通过学习结构关系提高质谱相似性评分。
PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.
2
Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.采用高分辨碎裂质谱对未知代谢物进行系统分类。
Nat Biotechnol. 2021 Apr;39(4):462-471. doi: 10.1038/s41587-020-0740-8. Epub 2020 Nov 23.
3
Cumulative learning enables convolutional neural network representations for small mass spectrometry data classification.
分子相似性:理论、应用与展望。
Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.
4
Binding-Site Purification of Actives (B-SPA) Enables Efficient Large-Scale Progression of Fragment Hits by Combining Multi-Step Array Synthesis With HT Crystallography.活性物质结合位点纯化(B-SPA)通过将多步阵列合成与高通量晶体学相结合,实现了片段命中物的高效大规模推进。
Angew Chem Int Ed Engl. 2025 Apr 11;64(16):e202424373. doi: 10.1002/anie.202424373. Epub 2025 Mar 18.
5
Challenges and applications of artificial intelligence in infectious diseases and antimicrobial resistance.人工智能在传染病和抗菌药物耐药性方面的挑战与应用
NPJ Antimicrob Resist. 2025 Jan 7;3(1):2. doi: 10.1038/s44259-024-00068-x.
6
JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data.JESTR:用于对非靶向代谢组学数据注释的候选分子进行排序的联合嵌入空间技术。
ArXiv. 2024 Nov 25:arXiv:2411.14464v2.
7
Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening.关于液相色谱/高分辨质谱非靶向筛查检测到的化学物质结构注释的计算机模拟方法的批判性综述。
Anal Bioanal Chem. 2025 Jan;417(3):473-493. doi: 10.1007/s00216-024-05471-x. Epub 2024 Aug 14.
8
Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method.基于变压器赋能片段树方法的小型质谱仪中未知分子的结构注释
Commun Chem. 2024 May 13;7(1):109. doi: 10.1038/s42004-024-01189-0.
9
Computational methods for processing and interpreting mass spectrometry-based metabolomics.基于质谱的代谢组学的计算方法处理和解释。
Essays Biochem. 2024 Apr 30;68(1):5-13. doi: 10.1042/EBC20230019.
10
MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem.疯帽匠在PubChem中搜索时能正确注释98%的小分子串联质谱图。
Metabolites. 2023 Feb 21;13(3):314. doi: 10.3390/metabo13030314.
累积学习使卷积神经网络能够对小质谱数据进行分类。
Nat Commun. 2020 Nov 5;11(1):5595. doi: 10.1038/s41467-020-19354-z.
4
PubChem in 2021: new data content and improved web interfaces.PubChem 在 2021 年:新增数据内容和改进的网络界面。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1388-D1395. doi: 10.1093/nar/gkaa971.
5
Predicting a Molecular Fingerprint from an Electron Ionization Mass Spectrum with Deep Neural Networks.利用深度神经网络从电子电离质谱中预测分子指纹。
Anal Chem. 2020 Jul 7;92(13):8649-8653. doi: 10.1021/acs.analchem.0c01450. Epub 2020 Jun 25.
6
Exploring the metabolomic diversity of plant species across spatial (leaf and stem) components and phylogenic groups.探究植物物种在空间(叶和茎)成分和系统发育群之间的代谢组多样性。
BMC Plant Biol. 2020 Jan 28;20(1):39. doi: 10.1186/s12870-019-2231-y.
7
MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra.MESSAR:串联质谱中代谢物亚结构的自动推荐。
PLoS One. 2020 Jan 16;15(1):e0226770. doi: 10.1371/journal.pone.0226770. eCollection 2020.
8
MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.MS2CNN:基于深度卷积神经网络的蛋白质序列预测 MS/MS 谱。
BMC Genomics. 2019 Dec 24;20(Suppl 9):906. doi: 10.1186/s12864-019-6297-6.
9
DeepIso: A Deep Learning Model for Peptide Feature Detection from LC-MS map.DeepIso:一种从 LC-MS 图谱中检测肽特征的深度学习模型。
Sci Rep. 2019 Nov 20;9(1):17168. doi: 10.1038/s41598-019-52954-4.
10
Rapid Prediction of Electron-Ionization Mass Spectrometry Using Neural Networks.使用神经网络对电子电离质谱进行快速预测。
ACS Cent Sci. 2019 Apr 24;5(4):700-708. doi: 10.1021/acscentsci.9b00085. Epub 2019 Mar 19.