• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RSGPT:一种基于一百亿数据点进行预训练的用于逆合成规划的生成式变压器模型。

RSGPT: a generative transformer model for retrosynthesis planning pre-trained on ten billion datapoints.

作者信息

Deng Yafeng, Zhao Xinda, Sun Hanyu, Chen Yu, Wang Xiaorui, Xue Xi, Li Liangning, Song Jianfei, Hsieh Chang-Yu, Hou Tingjun, Pan Xiandao, Alomar Taghrid Saad, Ji Xiangyang, Wang Xiaojian

机构信息

Department of Automation, Tsinghua University, Beijing, China.

Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China.

出版信息

Nat Commun. 2025 Jul 31;16(1):7012. doi: 10.1038/s41467-025-62308-6.

DOI:10.1038/s41467-025-62308-6
PMID:40744941
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12314115/
Abstract

Retrosynthesis planning is a crucial task in organic synthesis, and deep-learning methods have enhanced and accelerated this process. With the advancement of the emergence of large language models, the demand for data is rapidly increasing. However, available retrosynthesis data are limited to only millions. Therefore, we pioneer the utilization of the template-based algorithm to generate chemical reaction data, resulting in the production of over 10 billion reaction datapoints. A generative pretrained transformer model is subsequently developed for template-free retrosynthesis planning by pre-training on 10 billion generated data. Inspired by the strategies of large language models, we introduce reinforcement learning to capture the relationships among products, reactants, and templates more accurately. Experiments demonstrate that our model achieves state-of-the-art performance on the benchmark, with a Top-1 accuracy of 63.4%, substantially outperforming previous models.

摘要

逆合成规划是有机合成中的一项关键任务,深度学习方法提升并加速了这一过程。随着大语言模型的出现,对数据的需求迅速增长。然而,现有的逆合成数据仅数百万条。因此,我们率先利用基于模板的算法生成化学反应数据,生成了超过100亿个反应数据点。随后,通过对100亿个生成数据进行预训练,开发了一种生成式预训练变压器模型用于无模板逆合成规划。受大语言模型策略的启发,我们引入强化学习以更准确地捕捉产物、反应物和模板之间的关系。实验表明,我们的模型在基准测试中取得了领先的性能,Top-1准确率为63.4%,大幅超越了之前的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/3d7794eab6f9/41467_2025_62308_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/76ea21e6adaf/41467_2025_62308_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/22d864162f77/41467_2025_62308_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/6f79305e071d/41467_2025_62308_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/c11ebeb2c7de/41467_2025_62308_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/3d7794eab6f9/41467_2025_62308_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/76ea21e6adaf/41467_2025_62308_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/22d864162f77/41467_2025_62308_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/6f79305e071d/41467_2025_62308_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/c11ebeb2c7de/41467_2025_62308_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1d/12314115/3d7794eab6f9/41467_2025_62308_Fig6_HTML.jpg

相似文献

1
RSGPT: a generative transformer model for retrosynthesis planning pre-trained on ten billion datapoints.RSGPT:一种基于一百亿数据点进行预训练的用于逆合成规划的生成式变压器模型。
Nat Commun. 2025 Jul 31;16(1):7012. doi: 10.1038/s41467-025-62308-6.
2
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
3
Noise-aware system generative model (NASGM): positron emission tomography (PET) image simulation framework with observer validation studies.噪声感知系统生成模型(NASGM):用于正电子发射断层扫描(PET)图像模拟框架及观察者验证研究。
Med Phys. 2025 Jul;52(7):e17962. doi: 10.1002/mp.17962.
4
Sexual Harassment and Prevention Training性骚扰与预防培训
5
Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.探索用于眼球震颤分类的生成式预训练变换器-4视觉模型:瞳孔追踪过程的开发与验证
JMIR Form Res. 2025 Jun 6;9:e70070. doi: 10.2196/70070.
6
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.基于经验回放的演员-评论家算法用于前列腺癌调强放射治疗的自动治疗计划
Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.
7
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
8
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
9
A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。
Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.
10
Generative evidential synthesis with integrated segmentation framework for MR-only radiation therapy treatment planning.用于仅基于磁共振成像的放射治疗治疗计划的具有集成分割框架的生成性证据合成。
Med Phys. 2025 Jul;52(7):e17828. doi: 10.1002/mp.17828. Epub 2025 Apr 11.

本文引用的文献

1
Exhaustive local chemical space exploration using a transformer model.使用变压器模型进行详尽的局部化学空间探索。
Nat Commun. 2024 Aug 25;15(1):7315. doi: 10.1038/s41467-024-51672-4.
2
Computer-Aided Synthesis Planning (CASP) and Machine Learning: Optimizing Chemical Reaction Conditions.计算机辅助合成规划(CASP)与机器学习:优化化学反应条件
Chemistry. 2024 Oct 1;30(55):e202401626. doi: 10.1002/chem.202401626. Epub 2024 Sep 17.
3
Retrosynthesis prediction with an iterative string editing model.基于迭代字符串编辑模型的逆合成预测
Nat Commun. 2024 Jul 30;15(1):6404. doi: 10.1038/s41467-024-50617-1.
4
Node-Aligned Graph-to-Graph: Elevating Template-free Deep Learning Approaches in Single-Step Retrosynthesis.节点对齐的图到图:提升单步逆合成中无模板深度学习方法
JACS Au. 2024 Feb 13;4(3):992-1003. doi: 10.1021/jacsau.3c00737. eCollection 2024 Mar 25.
5
Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks.基于分子组装任务的可解释深度学习框架进行逆合成预测。
Nat Commun. 2023 Oct 3;14(1):6155. doi: 10.1038/s41467-023-41698-5.
6
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
7
GRetro as a two-step graph generative models for retrosynthesis prediction.GRetro作为一种用于逆合成预测的两步图生成模型。
Commun Chem. 2023 May 30;6(1):102. doi: 10.1038/s42004-023-00897-3.
8
Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing.基于端到端图生成架构的分子图编辑回溯合成预测。
Nat Commun. 2023 May 25;14(1):3009. doi: 10.1038/s41467-023-38851-5.
9
RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction.RetroComposer:基于模板的反合成预测的模板作曲。
Biomolecules. 2022 Sep 19;12(9):1325. doi: 10.3390/biom12091325.
10
Root-aligned SMILES: a tight representation for chemical reaction prediction.根对齐的简化分子线性输入规范(SMILES):一种用于化学反应预测的紧凑表示法。
Chem Sci. 2022 Jul 12;13(31):9023-9034. doi: 10.1039/d2sc02763a. eCollection 2022 Aug 10.