Suppr超能文献

双向分子生成的递归神经网络。

Bidirectional Molecule Generation with Recurrent Neural Networks.

机构信息

Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.

出版信息

J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.

Abstract

Recurrent neural networks (RNNs) are able to generate de novo molecular designs using simplified molecular input line entry systems (SMILES) string representations of the chemical structure. RNN-based structure generation is usually performed unidirectionally, by growing SMILES strings from left to right. However, there is no natural start or end of a small molecule, and SMILES strings are intrinsically nonunivocal representations of molecular graphs. These properties motivate bidirectional structure generation. Here, bidirectional generative RNNs for SMILES-based molecule design are introduced. To this end, two established bidirectional methods were implemented, and a new method for SMILES string generation and data augmentation is introduced-the bidirectional molecule design by alternate learning (BIMODAL). These three bidirectional strategies were compared to the unidirectional forward RNN approach for SMILES string generation, in terms of the (i) novelty, (ii) scaffold diversity, and (iii) chemical-biological relevance of the computer-generated molecules. The results positively advocate bidirectional strategies for SMILES-based molecular de novo design, with BIMODAL showing superior results to the unidirectional forward RNN for most of the criteria in the tested conditions. The code of the methods and the pretrained models can be found at URL https://github.com/ETHmodlab/BIMODAL.

摘要

递归神经网络 (RNN) 能够使用化学结构的简化分子输入行(entry)系统 (SMILES) 字符串表示来生成全新的分子设计。基于 RNN 的结构生成通常是单向进行的,即从左到右扩展 SMILES 字符串。然而,小分子没有自然的起始或结束,并且 SMILES 字符串本质上是非单义的分子图表示。这些特性促使我们进行双向结构生成。本文引入了基于 SMILES 的分子设计的双向生成 RNN。为此,实现了两种已建立的双向方法,并引入了一种新的用于 SMILES 字符串生成和数据扩充的方法——交替学习的双向分子设计 (BIMODAL)。这三种双向策略与用于 SMILES 字符串生成的单向前向 RNN 方法进行了比较,比较的指标有 (i) 新颖性,(ii) 支架多样性,以及 (iii) 计算机生成分子的化学生物学相关性。结果积极倡导使用双向策略进行基于 SMILES 的分子从头设计,在测试条件下,BIMODAL 在大多数指标上的结果优于单向前向 RNN。方法的代码和预训练模型可以在 URL https://github.com/ETHmodlab/BIMODAL 上找到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验