Suppr超能文献

G2GT:基于图到图注意力神经网络和自训练的逆合成预测

G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training.

作者信息

Lin Zaiyun, Yin Shiqiu, Shi Lei, Zhou Wenbiao, Zhang Yingsheng John

机构信息

Stone Wise, Room 918, Eighth Floor, Building 1, No. 6 Danling Street, Haidian District, Beijing, China 100089.

出版信息

J Chem Inf Model. 2023 Apr 10;63(7):1894-1905. doi: 10.1021/acs.jcim.2c01302. Epub 2023 Mar 22.

Abstract

Retrosynthesis prediction, the task of identifying reactant molecules that can be used to synthesize product molecules, is a fundamental challenge in organic chemistry and related fields. To address this challenge, we propose a novel graph-to-graph transformation model, G2GT. The model is built on the standard transformer structure and utilizes graph encoders and decoders. Additionally, we demonstrate the effectiveness of self-training, a data augmentation technique that utilizes unlabeled molecular data, in improving the performance of the model. To further enhance diversity, we propose a weak ensemble method, inspired by reaction-type labels and ensemble learning. This method incorporates beam search, nucleus sampling, and top- sampling to improve inference diversity. A simple ranking algorithm is employed to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K data set, with a top-1 accuracy of 54%, and the larger more challenging USPTO-Full data set, with a top-1 accuracy of 49.3% and competitive top-10 results. Our model can also be generalized to all other graph-to-graph transformation tasks. Data and code are available at https://github.com/Anonnoname/G2GT_2.

摘要

逆合成预测,即识别可用于合成产物分子的反应物分子的任务,是有机化学及相关领域的一项基本挑战。为应对这一挑战,我们提出了一种新颖的图到图转换模型,即G2GT。该模型基于标准的Transformer结构构建,并利用图编码器和解码器。此外,我们证明了自训练(一种利用未标记分子数据的数据增强技术)在提高模型性能方面的有效性。为进一步增强多样性,我们受反应类型标签和集成学习的启发,提出了一种弱集成方法。该方法结合了束搜索、核采样和顶级采样,以提高推理多样性。采用一种简单的排序算法来检索最终的前10个结果。我们在USPTO - 50K数据集上取得了新的最先进结果,其top - 1准确率为54%,在更大且更具挑战性的USPTO - Full数据集上,top - 1准确率为49.3%,且前10名结果具有竞争力。我们的模型还可以推广到所有其他图到图转换任务。数据和代码可在https://github.com/Anonnoname/G2GT_2获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验