Xia Song, Gu Yaowen, Zhang Yingkai
Department of Chemistry, New York University, New York, New York 10003, United States.
Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States.
J Chem Inf Model. 2025 Feb 10;65(3):1101-1114. doi: 10.1021/acs.jcim.4c01014. Epub 2025 Jan 17.
Molecular Docking is a critical task in structure-based virtual screening. Recent advancements have showcased the efficacy of diffusion-based generative models for blind docking tasks. However, these models do not inherently estimate protein-ligand binding strength thus cannot be directly applied to virtual screening tasks. Protein-ligand scoring functions serve as fast and approximate computational methods to evaluate the binding strength between the protein and ligand. In this work, we introduce normalized mixture density network (NMDN) score, a deep learning (DL)-based scoring function learning the probability density distribution of distances between protein residues and ligand atoms. The NMDN score addresses limitations observed in existing DL scoring functions and performs robustly in both pose selection and virtual screening tasks. Additionally, we incorporate an interaction module to predict the experimental binding affinity score to fully utilize the learned protein and ligand representations. Finally, we present an end-to-end blind docking and virtual screening protocol named DiffDock-NMDN. For each protein-ligand pair, we employ DiffDock to sample multiple poses, followed by utilizing the NMDN score to select the optimal binding pose, and estimating the binding affinity using scoring functions. Our protocol achieves an average enrichment factor of 4.96 on the LIT-PCBA data set, proving effective in real-world drug discovery scenarios where binder information is limited. This work not only presents a robust DL-based scoring function with superior pose selection and virtual screening capabilities but also offers a blind docking protocol and benchmarks to guide future scoring function development.
分子对接是基于结构的虚拟筛选中的一项关键任务。最近的进展展示了基于扩散的生成模型在盲对接任务中的有效性。然而,这些模型本身并不能估计蛋白质-配体的结合强度,因此不能直接应用于虚拟筛选任务。蛋白质-配体评分函数是用于评估蛋白质和配体之间结合强度的快速且近似的计算方法。在这项工作中,我们引入了归一化混合密度网络(NMDN)评分,这是一种基于深度学习(DL)的评分函数,用于学习蛋白质残基与配体原子之间距离的概率密度分布。NMDN评分解决了现有DL评分函数中存在的局限性,并且在姿态选择和虚拟筛选任务中均表现出色。此外,我们纳入了一个相互作用模块来预测实验结合亲和力评分,以充分利用所学习到的蛋白质和配体表示。最后,我们提出了一种名为DiffDock-NMDN的端到端盲对接和虚拟筛选协议。对于每一对蛋白质-配体,我们使用DiffDock对多个姿态进行采样,然后利用NMDN评分选择最佳结合姿态,并使用评分函数估计结合亲和力。我们的协议在LIT-PCBA数据集上实现了4.96的平均富集因子,证明在结合剂信息有限的实际药物发现场景中是有效的。这项工作不仅提出了一种具有卓越姿态选择和虚拟筛选能力的强大的基于DL的评分函数,还提供了一种盲对接协议和基准,以指导未来评分函数的开发。