Graduate School of AI, POSTECH, 77 Cheongam-Ro, Pohang, 37673, Gyeongbuk, Republic of Korea.
Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea.
J Comput Aided Mol Des. 2024 Aug 27;38(1):32. doi: 10.1007/s10822-024-00571-3.
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
在过去的十年中,用于发现具有类药性的分子的自动化学设计框架取得了重大进展。其中,变分自动编码器(VAE)是一种先进的方法,它可以对分子空间的可处理潜在空间进行建模。特别是,使用 VAE 结合属性估计器引起了极大的兴趣,因为它可以实现给定分子的基于梯度的优化。然而,尽管在实验中取得了成功的结果,但该方法的正确操作的理论背景和前提条件尚未阐明。有鉴于此,我们从理论上对整个框架进行了分析和严格重建。从参数化分布和信息论的角度,我们首先描述了以前的模型如何克服β VAE 在发现具有所需属性的分子方面的局限性。此外,我们描述了训练上述模型的前提条件。接下来,从每个项的对数似然的角度来看,我们重新制定了探索潜在空间以生成类药性分子的目标。本研究定义了分布约束,这将摆脱无效的分子搜索。我们证明了我们的模型可以在从头开始的方法中发现一种针对 BCL-2 家族蛋白的新型化学化合物。通过理论分析和实际实现,验证了操作模型的上述前提条件和约束的重要性。