Blaschke Thomas, Engkvist Ola, Bajorath Jürgen, Chen Hongming
Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
Department of Life Science Informatics, LIMES Program Unit Chemical Biology and Medicinal Chemistry B-IT, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, Bonn, 53115, Germany.
J Cheminform. 2020 Nov 10;12(1):68. doi: 10.1186/s13321-020-00473-0.
In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.
在从头分子设计中,递归神经网络(RNN)已被证明是用于采样和生成新型化学结构的有效方法。使用一种称为强化学习(RL)的技术,可以通过评分函数调整RNN,以针对具有优化的理想特性的化学空间的特定部分。然而,到目前为止,当前RL方法生成的配体往往具有相对较低的多样性,并且在朝着理想特性进行优化时有时甚至会导致重复结构。在此,我们提出一种新方法来解决分子设计中RL的低多样性问题。记忆辅助RL是已知RL的扩展,引入了一个所谓的记忆单元。作为概念验证,我们应用我们的方法生成具有所需AlogP值的结构。在第二个案例研究中,我们应用我们的方法为多巴胺2型受体和5-羟色胺1A型受体设计配体。对于这两种受体,开发了一个机器学习模型来预测生成的分子对该受体是否具有活性。在这两个案例研究中,发现记忆辅助RL导致生成更多预测具有活性且具有更高化学多样性的化合物,因此与已建立的RL方法相比,能更好地覆盖已知配体的化学空间。