Blanchard Andrew E, Stanley Christopher, Bhowmik Debsindhu
Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA.
J Cheminform. 2021 Feb 23;13(1):14. doi: 10.1186/s13321-021-00494-3.
The process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.
药物发现过程涉及在所有可能的化合物空间中进行搜索。生成对抗网络(GAN)为探索化学空间和优化已知化合物以实现所需功能提供了一个有价值的工具。然而,训练GAN的标准方法可能会导致模式坍塌,即生成器主要生成与训练数据的一小部分密切相关的样本。相比之下,寻找新型化合物需要超越原始数据进行探索。在此,我们提出一种训练GAN的方法,该方法利用遗传算法的概念促进渐进式探索并限制模式坍塌的影响。在我们的方法中,生成器生成的有效样本用于替换训练数据中的样本。在替换过程中,我们考虑随机选择、引导选择以及重组。通过跟踪训练过程中产生的新型化合物数量,我们表明对训练数据的更新显著优于传统方法,增加了GAN在药物发现中的潜在应用。