Thomas Morgan, O'Boyle Noel M, Bender Andreas, de Graaf Chris
Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.
J Cheminform. 2022 Oct 3;14(1):68. doi: 10.1186/s13321-022-00646-z.
A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 10 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.
现在有大量基于人工智能的技术可用于进行从头分子生成,这些技术能够在药物设计的背景下设计出针对特定终点的分子。一种流行的方法是使用强化学习来更新递归神经网络或基于语言的从头分子生成器。然而,强化学习可能效率低下,有时需要对多达10个分子进行采样才能优化更复杂的目标,这在使用对接或计算机辅助合成规划模型等计算成本高昂的评分函数时构成了限制。在这项工作中,我们提出了一种名为增强爬山法的强化学习策略,它基于REINVENT和爬山法之间简单的、假设驱动的混合方法,通过解决当前使用的两种策略的局限性来提高样本效率。我们将其优化几个对接任务的能力与REINVENT进行比较,并将该策略与其他常用的强化学习策略(包括REINFORCE、REINVENT(版本1和2)、爬山法和最佳智能体提醒)进行基准测试。我们发现,与REINVENT相比,优化能力提高了约1.5倍,样本效率提高了约45倍,同时仍能产生吸引人的化学结构作为输出。使用了多样性过滤器,并对其参数进行了调整,以克服利用某些多样性过滤器配置的观察到的失败模式。我们发现,增强爬山法在六项任务上优于其他使用的强化学习策略,尤其是在训练的早期阶段或针对更困难的目标时。最后,我们不仅在递归神经网络上,而且在强化学习稳定的变压器架构上都展示了改进的性能。总体而言,我们表明,与当前的最新技术相比,增强爬山法通过强化学习提高了基于语言的从头分子生成条件的样本效率。这使得对接等计算成本更高的评分函数在相关时间尺度上更容易使用。