Suppr超能文献

GEN:使用自学习生成式检查网络的高效SMILES资源探索器。

GEN: highly efficient SMILES explorer using autodidactic generative examination networks.

作者信息

van Deursen Ruud, Ertl Peter, Tetko Igor V, Godin Guillaume

机构信息

Firmenich SA, Research and Development, Rue des Jeunes 1, Les Acacias, 1227, Geneva, Switzerland.

Novartis Institutes for BioMedical Research, Novartis Campus, 4056, Basel, Switzerland.

出版信息

J Cheminform. 2020 Apr 10;12(1):22. doi: 10.1186/s13321-020-00425-8.

Abstract

Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85-90%) while generating SMILES with strong conservation of the property space (95-99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.

摘要

循环神经网络已被广泛用于在特定化学空间中生成数百万种从头合成的分子。已报道的深度生成模型仅基于长短期记忆(LSTM)和/或门控循环单元(GRU),并且经常使用标准的SMILES进行训练。在本研究中,我们引入了生成式检验网络(GEN)作为一种训练用于生成SMILES的深度生成网络的新方法。在我们的GEN中,我们使用了一种基于多个串联双向循环神经网络单元的架构来提高生成的SMILES的有效性。GEN能够在几个训练周期内自主学习目标空间,并使用独立的在线检验机制提前停止训练,该机制用于衡量生成集的质量。在此,我们使用在线统计质量控制(SQC)以有效分子SMILES的百分比作为检验指标,来选择最早可用的稳定模型权重。通过使用多个并行编码层并结合使用无限制的SMILES随机化进行SMILES增强,可以生成非常高比例的有效SMILES(95 - 98%)。我们训练的模型在生成SMILES时具有出色的新颖率(85 - 90%),同时在属性空间中具有很强的守恒性(95 - 99%)。在GEN中,生成网络和检验机制都可以采用其他架构和质量标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2108/7146994/7cf063c24c0b/13321_2020_425_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验