Suppr超能文献

基于扩展原子类型特征的几何图学习用于蛋白质-配体结合亲和力预测

Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction.

作者信息

Rana Md Masud, Nguyen Duc Duy

机构信息

Department of Mathematics, University of Kentucky, Lexington, 40506, KY, USA.

出版信息

Comput Biol Med. 2023 Sep;164:107250. doi: 10.1016/j.compbiomed.2023.107250. Epub 2023 Jul 17.

Abstract

Understanding and accurately predicting protein-ligand binding affinity are essential in the drug design and discovery process. At present, machine learning-based methodologies are gaining popularity as a means of predicting binding affinity due to their efficiency and accuracy, as well as the increasing availability of structural and binding affinity data for protein-ligand complexes. In biomolecular studies, graph theory has been widely applied since graphs can be used to model molecules or molecular complexes in a natural manner. In the present work, we upgrade the graph-based learners for the study of protein-ligand interactions by integrating extensive atom types such as SYBYL and extended connectivity interactive features (ECIF) into multiscale weighted colored graphs (MWCG). By pairing with the gradient boosting decision tree (GBDT) machine learning algorithm, our approach results in two different methods, namely GGL-Score and GGL-Score. Both of our models are extensively validated in their scoring power using three commonly used benchmark datasets in the drug design area, namely CASF-2007, CASF-2013, and CASF-2016. The performance of our best model GGL-Score is compared with other state-of-the-art models in the binding affinity prediction for each benchmark. While both of our models achieve state-of-the-art results, the SYBYL atom-type model GGL-Score outperforms other methods by a wide margin in all benchmarks. Finally, the best-performing SYBYL atom-type model is evaluated on two test sets that are independent of CASF benchmarks.

摘要

理解并准确预测蛋白质-配体结合亲和力在药物设计与发现过程中至关重要。目前,基于机器学习的方法因其效率和准确性,以及蛋白质-配体复合物结构和结合亲和力数据的日益可得,作为预测结合亲和力的手段正越来越受欢迎。在生物分子研究中,图论已被广泛应用,因为图可以自然的方式用于对分子或分子复合物进行建模。在本工作中,我们通过将广泛的原子类型(如SYBYL)和扩展连接性交互特征(ECIF)整合到多尺度加权彩色图(MWCG)中,对基于图的学习器进行升级,以研究蛋白质-配体相互作用。通过与梯度提升决策树(GBDT)机器学习算法配对,我们的方法产生了两种不同的方法,即GGL-Score和GGL-Score。我们的两个模型都使用药物设计领域常用的三个基准数据集,即CASF-2007、CASF-2013和CASF-2016,对其评分能力进行了广泛验证。在每个基准的结合亲和力预测中,我们最佳模型GGL-Score的性能与其他最先进的模型进行了比较。虽然我们的两个模型都取得了最先进的结果,但SYBYL原子类型模型GGL-Score在所有基准中都大幅优于其他方法。最后,在两个独立于CASF基准的测试集上评估了性能最佳的SYBYL原子类型模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验