University of Electronic Science and Technology of China.
UESTC.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa364.
The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and high-throughput screening usually make the process extraordinarily tough and complicated since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove daunting in certain complex domains. Generative adversarial network-based methods also mostly discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learn a transferable reward function based on the entropy maximization inverse reinforcement learning (IRL) paradigm. We show from our experiments that the IRL route offers a rational alternative for generating chemical compounds in domains where reward function engineering may be less appealing or impossible while data exhibiting the desired objective is readily available.
化学文库的规模和质量对药物发现管道至关重要,对于开发新药或重新利用现有药物至关重要。现有的技术,如组合有机合成和高通量筛选,通常使这个过程非常艰难和复杂,因为合成可行药物的搜索空间非常巨大。虽然强化学习在文献中主要用于生成新的化合物,但在某些复杂领域中,设计一个简洁地表示学习目标的奖励函数的要求可能是艰巨的。基于生成对抗网络的方法也大多在训练后丢弃鉴别器,并且可能难以训练。在这项研究中,我们提出了一个基于最大熵逆强化学习(IRL)范例训练化合物生成器和学习可转移奖励函数的框架。我们从实验中表明,在奖励函数工程可能不太吸引人或不可能的情况下,而数据显示出所需的目标很容易获得的情况下,IRL 路线为生成化学化合物提供了一种合理的替代方案。