Department of Computer Science, California State University, Los Angeles, California.
Kravis Department of Integrated Sciences, Claremont McKenna College, Claremont, California.
Biophys J. 2024 Sep 3;123(17):2839-2848. doi: 10.1016/j.bpj.2024.03.017. Epub 2024 Mar 13.
The use of fast in silico prediction methods for protein-ligand binding free energies holds significant promise for the initial phases of drug development. Numerous traditional physics-based models (e.g., implicit solvent models), however, tend to either neglect or heavily approximate entropic contributions to binding due to their computational complexity. Consequently, such methods often yield imprecise assessments of binding strength. Machine learning models provide accurate predictions and can often outperform physics-based models. They, however, are often prone to overfitting, and the interpretation of their results can be difficult. Physics-guided machine learning models combine the consistency of physics-based models with the accuracy of modern data-driven algorithms. This work integrates physics-based model conformational entropies into a graph convolutional network. We introduce a new neural network architecture (a rule-based graph convolutional network) that generates molecular fingerprints according to predefined rules specifically optimized for binding free energy calculations. Our results on 100 small host-guest systems demonstrate significant improvements in convergence and preventing overfitting. We additionally demonstrate the transferability of our proposed hybrid model by training it on the aforementioned host-guest systems and then testing it on six unrelated protein-ligand systems. Our new model shows little difference in training set accuracy compared to a previous model but an order-of-magnitude improvement in test set accuracy. Finally, we show how the results of our hybrid model can be interpreted in a straightforward fashion.
快速的基于计算机的蛋白质 - 配体结合自由能预测方法在药物开发的初始阶段具有重要的应用前景。然而,由于计算复杂性,许多传统的基于物理的模型(例如,隐溶剂模型)往往忽略或严重近似结合的熵贡献。因此,这些方法通常会产生不精确的结合强度评估。机器学习模型提供了准确的预测,并且通常可以胜过基于物理的模型。然而,它们往往容易过度拟合,并且其结果的解释可能很困难。基于物理的机器学习模型将基于物理的模型的一致性与现代数据驱动算法的准确性相结合。这项工作将基于物理的模型构象熵集成到图卷积网络中。我们引入了一种新的神经网络架构(基于规则的图卷积网络),该架构根据专门针对结合自由能计算优化的预定义规则生成分子指纹。我们在 100 个小分子 - 客体系统上的结果表明,在收敛和防止过拟合方面有显著的改进。我们还通过在上述客体系统上训练我们提出的混合模型,并在六个不相关的蛋白质 - 配体系统上进行测试,展示了我们提出的混合模型的可转移性。我们的新模型在训练集准确性方面与以前的模型相比几乎没有差异,但在测试集准确性方面有一个数量级的提高。最后,我们展示了如何以一种直接的方式解释我们的混合模型的结果。