Zhang Hanwen, Xiong Deng, Liu Xianggen, Lv Jiancheng
College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu 610065, China.
Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, No.24 South Section 1, Yihuan Road, Chengdu 610065, China.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf094.
Structure-based drug design aims to generate molecules that fill the cavity of the protein pocket with a high binding affinity. Many contemporary studies employ sequential generative models. Their standard training method is to sequentialize molecular graphs into ordered sequences and then maximize the likelihood of the resulting sequences. However, the exact likelihood is computationally intractable, which involves a sum over all possible sequential orders. Molecular graphs lack an inherent order and the number of orders is factorial in the graph size. To avoid the intractable full space of factorially-many orders, existing works pre-define a fixed node ordering scheme such as depth-first search to sequentialize the 3D molecular graphs. In these cases, the training objectives are loose lower bounds of the exact likelihoods which are suboptimal for generation. To address the challenges, we propose a unified generative framework named MolEM to learn the 3D molecular graphs and corresponding sequential orders jointly. We derive a tight lower bound of the likelihood and maximize it via variational expectation-maximization algorithm, opening a new line of research in learning-based ordering schemes for 3D molecular graph generation. Besides, we first incorporate the molecular docking method QuickVina 2 to manipulate the binding poses, leading to accurate and flexible ligand conformations. Experimental results demonstrate that MolEM significantly outperforms baseline models in generating molecules with high binding affinities and realistic structures. Our approach efficiently approximates the true marginal graph likelihood and identifies reasonable orderings for 3D molecular graphs, aligning well with relevant chemical priors.
基于结构的药物设计旨在生成具有高结合亲和力的分子,以填充蛋白质口袋的空腔。许多当代研究采用序列生成模型。其标准训练方法是将分子图序列化为有序序列,然后最大化所得序列的似然性。然而,精确的似然性在计算上是难以处理的,这涉及对所有可能的序列顺序进行求和。分子图缺乏固有的顺序,并且顺序的数量在图的大小上是阶乘的。为了避免阶乘数量级的难以处理的全空间,现有工作预先定义了一种固定的节点排序方案,如深度优先搜索,以将3D分子图序列化。在这些情况下,训练目标是精确似然性的宽松下界,对于生成来说是次优的。为了应对这些挑战,我们提出了一个名为MolEM的统一生成框架,以联合学习3D分子图和相应的序列顺序。我们推导了似然性的紧密下界,并通过变分期望最大化算法对其进行最大化,为基于学习的3D分子图生成排序方案开辟了一条新的研究路线。此外,我们首先纳入分子对接方法QuickVina 2来操纵结合姿势,从而得到准确且灵活的配体构象。实验结果表明,MolEM在生成具有高结合亲和力和逼真结构的分子方面显著优于基线模型。我们的方法有效地近似了真实的边际图似然性,并为3D分子图识别了合理的排序,与相关化学先验知识高度吻合。