• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

最先进的增强型自然语言处理转换器模型,用于直接和单步逆合成。

State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.

机构信息

Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.

BIGCHEM GmbH, Valerystr. 49, D-85716, Unterschleißheim, Germany.

出版信息

Nat Commun. 2020 Nov 4;11(1):5575. doi: 10.1038/s41467-020-19266-y.

DOI:10.1038/s41467-020-19266-y
PMID:33149154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7643129/
Abstract

We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously. The top-5 accuracy was 84.8% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for the USPTO-50k test dataset, and was achieved by a combination of SMILES augmentation and a beam search algorithm. The same approach provided significantly better results for the prediction of direct reactions from the single-step USPTO-MIT test set. Our model achieved 90.6% top-1 and 96.1% top-5 accuracy for its challenging mixed set and 97% top-5 accuracy for the USPTO-MIT separated set. It also significantly improved results for USPTO-full set single-step retrosynthesis for both top-1 and top-10 accuracies. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.

摘要

我们研究了不同的训练场景对使用化学反应的文本表示形式(SMILES)和自然语言处理(NLP)神经网络 Transformer 架构预测化合物的(反)合成的影响。我们表明,数据增强是图像处理中使用的一种强大方法,通过神经网络消除了数据记忆的影响,并提高了它们对新序列的预测性能。当同时对输入和目标数据进行增强时,会观察到这种效果。对于 USPTO-50k 测试数据集,我们的模型在预测最大片段(从而确定经典反合成的主要转化)方面的准确率最高可达 84.8%,这是通过 SMILES 增强和波束搜索算法的组合实现的。同样的方法为直接反应的预测提供了显著更好的结果,来自单步 USPTO-MIT 测试集。我们的模型在其具有挑战性的混合集中实现了 90.6%的 top-1 和 96.1%的 top-5 准确率,在 USPTO-MIT 分离集中实现了 97%的 top-5 准确率。它还显著提高了 USPTO-ful 集单步反合成的 top-1 和 top-10 准确率。最丰富生成的 SMILES 的出现频率与预测结果密切相关,可作为反应预测质量的衡量标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/ae87a799bc1f/41467_2020_19266_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/83a95bd0f5f3/41467_2020_19266_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/11cbe6898ccb/41467_2020_19266_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/cb784fdafb97/41467_2020_19266_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/a7ec35e299d0/41467_2020_19266_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/f4c4e84d1908/41467_2020_19266_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/ae87a799bc1f/41467_2020_19266_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/83a95bd0f5f3/41467_2020_19266_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/11cbe6898ccb/41467_2020_19266_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/cb784fdafb97/41467_2020_19266_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/a7ec35e299d0/41467_2020_19266_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/f4c4e84d1908/41467_2020_19266_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a74/7643129/ae87a799bc1f/41467_2020_19266_Fig6_HTML.jpg

相似文献

1
State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.最先进的增强型自然语言处理转换器模型,用于直接和单步逆合成。
Nat Commun. 2020 Nov 4;11(1):5575. doi: 10.1038/s41467-020-19266-y.
2
Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习:基于小规模化学反应数据集的逆向合成预测扩展到新的水平。
Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.
3
G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training.G2GT:基于图到图注意力神经网络和自训练的逆合成预测
J Chem Inf Model. 2023 Apr 10;63(7):1894-1905. doi: 10.1021/acs.jcim.2c01302. Epub 2023 Mar 22.
4
CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction.基于 CNN 的双分支多尺度特征提取网络用于逆合成预测。
BMC Bioinformatics. 2022 Sep 2;23(1):362. doi: 10.1186/s12859-022-04904-7.
5
Unified Deep Learning Model for Multitask Reaction Predictions with Explanation.具有解释功能的多任务反应预测统一深度学习模型。
J Chem Inf Model. 2022 Mar 28;62(6):1376-1387. doi: 10.1021/acs.jcim.1c01467. Epub 2022 Mar 10.
6
Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment.Ualign:通过无监督的SMILES比对突破无模板逆合成预测的极限。
J Cheminform. 2024 Jul 15;16(1):80. doi: 10.1186/s13321-024-00877-2.
7
Reagent prediction with a molecular transformer improves reaction data quality.使用分子变换器进行试剂预测可提高反应数据质量。
Chem Sci. 2023 Mar 1;14(12):3235-3246. doi: 10.1039/d2sc06798f. eCollection 2023 Mar 22.
8
Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing.基于端到端图生成架构的分子图编辑回溯合成预测。
Nat Commun. 2023 May 25;14(1):3009. doi: 10.1038/s41467-023-38851-5.
9
RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation.RetroCaptioner:通过对比标注的可学习图表示,在端到端逆合成变换器中超越注意力。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae561.
10
RPBP: Deep Retrosynthesis Reaction Prediction Based on Byproducts.RPBP:基于副产物的深度逆合成反应预测
J Chem Inf Model. 2023 Oct 9;63(19):5956-5970. doi: 10.1021/acs.jcim.3c00274. Epub 2023 Sep 19.

引用本文的文献

1
BatGPT-Chem: A Foundation Large Model for Chemical Engineering.BatGPT-Chem:一种用于化学工程的基础大模型。
Research (Wash D C). 2025 Sep 10;8:0827. doi: 10.34133/research.0827. eCollection 2025.
2
Predicting reaction conditions: a data-driven perspective.预测反应条件:数据驱动的视角
Chem Sci. 2025 Aug 6. doi: 10.1039/d5sc03045e.
3
Enhancing deep chemical reaction prediction with advanced chirality and fragment representation.利用先进的手性和片段表示法增强深度化学反应预测。

本文引用的文献

1
Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强
J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.
2
Automatic retrosynthetic route planning using template-free models.使用无模板模型的自动逆合成路线规划。
Chem Sci. 2020 Mar 3;11(12):3355-3364. doi: 10.1039/c9sc03666k.
3
Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy.使用基于Transformer的模型和超图探索策略预测逆合成途径。
Chem Commun (Camb). 2025 Sep 11. doi: 10.1039/d5cc02641e.
4
Going beyond SMILES enumeration for data augmentation in generative drug discovery.超越用于生成式药物发现中数据增强的SMILES枚举法。
Digit Discov. 2025 Aug 14. doi: 10.1039/d5dd00028a.
5
Graph-sequence enhanced transformer for template-free prediction of natural product biosynthesis.用于天然产物生物合成无模板预测的图序列增强变压器
Patterns (N Y). 2025 Apr 30;6(8):101259. doi: 10.1016/j.patter.2025.101259. eCollection 2025 Aug 8.
6
Electron flow matching for generative reaction mechanism prediction.用于生成反应机理预测的电子流匹配
Nature. 2025 Aug 20. doi: 10.1038/s41586-025-09426-9.
7
HiCLR: Knowledge-Induced Hierarchical Contrastive Learning with Retrosynthesis Prediction Yields a Reaction Foundation Model.HiCLR:基于逆合成预测的知识诱导分层对比学习产生反应基础模型。
JACS Au. 2025 Jun 25;5(7):3140-3155. doi: 10.1021/jacsau.5c00289. eCollection 2025 Jul 28.
8
RSGPT: a generative transformer model for retrosynthesis planning pre-trained on ten billion datapoints.RSGPT:一种基于一百亿数据点进行预训练的用于逆合成规划的生成式变压器模型。
Nat Commun. 2025 Jul 31;16(1):7012. doi: 10.1038/s41467-025-62308-6.
9
Transfer Learning for Heterocycle Retrosynthesis.用于杂环逆合成的迁移学习
J Chem Inf Model. 2025 Aug 11;65(15):7851-7861. doi: 10.1021/acs.jcim.4c02041. Epub 2025 Jul 29.
10
A review of transformer models in drug discovery and beyond.药物发现及其他领域中变压器模型综述。
J Pharm Anal. 2025 Jun;15(6):101081. doi: 10.1016/j.jpha.2024.101081. Epub 2024 Aug 30.
Chem Sci. 2020 Mar 3;11(12):3316-3325. doi: 10.1039/c9sc05704h.
4
Transformer-CNN: Swiss knife for QSAR modeling and interpretation.Transformer-CNN:用于QSAR建模与解释的多功能工具
J Cheminform. 2020 Mar 18;12(1):17. doi: 10.1186/s13321-020-00423-w.
5
Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning.基于模板的回溯合成预测的计算机辅助合成规划中的数据增强和预训练。
J Chem Inf Model. 2020 Jul 27;60(7):3398-3407. doi: 10.1021/acs.jcim.0c00403. Epub 2020 Jul 5.
6
QSAR without borders.无边界定量构效关系。
Chem Soc Rev. 2020 Jun 7;49(11):3525-3564. doi: 10.1039/d0cs00098a. Epub 2020 May 1.
7
Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis.人工智能在药物化学合成中的当前和未来作用。
J Med Chem. 2020 Aug 27;63(16):8667-8682. doi: 10.1021/acs.jmedchem.9b02120. Epub 2020 Apr 14.
8
Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks.使用自校正变换神经网络预测逆向合成反应。
J Chem Inf Model. 2020 Jan 27;60(1):47-55. doi: 10.1021/acs.jcim.9b00949. Epub 2019 Dec 24.
9
Prediction and Interpretable Visualization of Retrosynthetic Reactions Using Graph Convolutional Networks.使用图卷积网络预测和可解释的反合成反应可视化。
J Chem Inf Model. 2019 Dec 23;59(12):5026-5033. doi: 10.1021/acs.jcim.9b00538. Epub 2019 Dec 9.
10
Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.分子变压器:一种用于不确定性校准化学反应预测的模型。
ACS Cent Sci. 2019 Sep 25;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576. Epub 2019 Aug 30.