Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States.
Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States.
J Chem Inf Model. 2024 Mar 25;64(6):1907-1918. doi: 10.1021/acs.jcim.3c02054. Epub 2024 Mar 12.
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔ-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
蛋白质-配体结合自由能是基于结构的计算药物发现工作中的一个核心数量。虽然流行的化学方法为计算大量系统的结合自由能提供了合理的统计手段,但它们通常过于昂贵,无法像终点或基于配体的方法那样频繁应用。相比之下,这些数据驱动的方法通常足够快,可以解决数千个系统的问题,但对未见系统的可转移性降低。我们引入了 DrΔ-Net(或简称 Dragnet),这是一种等变图神经网络,可以融合基于配体和基于蛋白质-配体的数据驱动方法。它基于配体本身和与蛋白质靶标结合的 3D 指纹表示。Dragnet 是一种全局评分函数,可预测任意蛋白质-配体复合物的结合亲和力,但可以通过迁移学习轻松调整到特定系统或终点,在这些任务中表现与常见的 2D 基于配体的方法相似。Dragnet 在总共 28 个验证蛋白上进行了评估,这些蛋白具有一组源自 Binding DB 的同系配体和一组从 ChEMBL 数据库中提取的定制配体。一般来说,少量的实验结合亲和力就足以优化特定蛋白质和配体支架的评分函数。当没有可用的时,可以使用基于物理的方法(如绝对自由能扰动)的预测值来进行 Dragnet 的迁移学习调整。此外,我们使用我们的数据来说明目前基于数据的结合自由能预测建模的局限性。