Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
Stanford University, Department of Computer Science, 450 Sierra Mall, Stanford, CA, 94305, USA.
Mol Inform. 2018 Jan;37(1-2). doi: 10.1002/minf.201700111. Epub 2017 Nov 2.
Generative artificial intelligence models present a fresh approach to chemogenomics and de novo drug design, as they provide researchers with the ability to narrow down their search of the chemical space and focus on regions of interest. We present a method for molecular de novo design that utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells. This computational model captured the syntax of molecular representation in terms of SMILES strings with close to perfect accuracy. The learned pattern probabilities can be used for de novo SMILES generation. This molecular design concept eliminates the need for virtual compound library enumeration. By employing transfer learning, we fine-tuned the RNN's predictions for specific molecular targets. This approach enables virtual compound design without requiring secondary or external activity prediction, which could introduce error or unwanted bias. The results obtained advocate this generative RNN-LSTM system for high-impact use cases, such as low-data drug discovery, fragment based molecular design, and hit-to-lead optimization for diverse drug targets.
生成式人工智能模型为化学生物学和从头药物设计带来了新方法,因为它们为研究人员提供了缩小化学空间搜索范围并关注感兴趣区域的能力。我们提出了一种利用包含长短期记忆(LSTM)单元的生成式递归神经网络(RNN)进行分子从头设计的方法。该计算模型以接近完美的准确性捕捉了分子表示的 SMILES 字符串语法。学习到的模式概率可用于从头生成 SMILES。这种分子设计理念消除了对虚拟化合物库枚举的需求。通过采用迁移学习,我们对 RNN 针对特定分子靶标的预测进行了微调。这种方法能够进行虚拟化合物设计,而无需进行二次或外部活性预测,因为这可能会引入错误或不必要的偏差。所得结果表明,这种生成式 RNN-LSTM 系统非常适用于具有高影响力的用例,例如低数据药物发现、基于片段的分子设计以及针对各种药物靶标的命中到先导优化。