Lee Ho-Joon, Emani Prashant S, Gerstein Mark B
Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, United States.
J Chem Inf Model. 2024 Dec 9;64(23):8684-8704. doi: 10.1021/acs.jcim.4c01116. Epub 2024 Nov 22.
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.
通过计算方法准确筛选针对靶蛋白的候选药物配体是药物研发工作的首要关注点。这种虚拟筛选部分依赖于预测配体与蛋白质之间结合亲和力的方法。已经开发了许多用于结合亲和力预测的计算模型,但针对不同靶点的结果各异。鉴于集成或元建模方法在减少模型特定偏差方面显示出巨大潜力,我们开发了一个框架,以整合已发表的基于力场的经验性对接模型和基于序列的深度学习模型。在构建这个框架时,我们评估了单个基础模型、训练数据库以及几种元建模方法的多种组合。我们表明,我们的许多元模型在亲和力预测方面比基础模型有显著改进。我们最好的元模型实现了与仅基于三维结构的最先进深度学习工具相当的性能,同时通过明确纳入物理化学性质或分子描述符等特征,提高了数据库的可扩展性和灵活性。我们还通过使用大规模亲和力预测基准以及虚拟筛选应用基准,展示了我们模型的泛化能力得到了提高。总体而言,我们证明了可以将多种建模方法整合在一起,以在结合亲和力预测方面取得有意义的改进。