Suppr超能文献

RetroCaptioner:通过对比标注的可学习图表示,在端到端逆合成变换器中超越注意力。

RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation.

机构信息

School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 102488, China.

Ministry of Education, Engineering Research Center for Pharmaceutics of Chinese Materia Medica and New Drug Development, Beijing, 100102, China.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae561.

Abstract

MOTIVATION

Retrosynthesis identifies available precursor molecules for various and novel compounds. With the advancements and practicality of language models, Transformer-based models have increasingly been used to automate this process. However, many existing methods struggle to efficiently capture reaction transformation information, limiting the accuracy and applicability of their predictions.

RESULTS

We introduce RetroCaptioner, an advanced end-to-end, Transformer-based framework featuring a Contrastive Reaction Center Captioner. This captioner guides the training of dual-view attention models using a contrastive learning approach. It leverages learned molecular graph representations to capture chemically plausible constraints within a single-step learning process. We integrate the single-encoder, dual-encoder, and encoder-decoder paradigms to effectively fuse information from the sequence and graph representations of molecules. This involves modifying the Transformer encoder into a uni-view sequence encoder and a dual-view module. Furthermore, we enhance the captioning of atomic correspondence between SMILES and graphs. Our proposed method, RetroCaptioner, achieved outstanding performance with 67.2% in top-1 and 93.4% in top-10 exact matched accuracy on the USPTO-50k dataset, alongside an exceptional SMILES validity score of 99.4%. In addition, RetroCaptioner has demonstrated its reliability in generating synthetic routes for the drug protokylol.

AVAILABILITY AND IMPLEMENTATION

The code and data are available at https://github.com/guofei-tju/RetroCaptioner.

摘要

动机

逆合成分析可识别各种新颖化合物的可用前体分子。随着语言模型的进步和实用性的提高,基于转换器的模型越来越多地被用于自动化这一过程。然而,许多现有的方法难以有效地捕捉反应转化信息,限制了它们预测的准确性和适用性。

结果

我们引入了 RetroCaptioner,这是一种基于端到端的先进的基于转换器的框架,具有对比反应中心字幕器。该字幕器使用对比学习方法指导双视图注意力模型的训练。它利用学习到的分子图表示来在单个学习过程中捕获化学上合理的约束。我们集成了单编码器、双编码器和编码器-解码器范式,以有效地融合分子序列和图形表示的信息。这涉及将 Transformer 编码器修改为单视图序列编码器和双视图模块。此外,我们增强了 SMILES 和图形之间原子对应关系的字幕。我们提出的方法 RetroCaptioner 在 USPTO-50k 数据集上取得了卓越的性能,在 top-1 中达到了 67.2%的准确率,在 top-10 中达到了 93.4%的准确率,同时 SMILES 有效性得分达到了 99.4%。此外,RetroCaptioner 还证明了其在生成药物 protokylol 合成路线方面的可靠性。

可用性和实现

代码和数据可在 https://github.com/guofei-tju/RetroCaptioner 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验