Suppr超能文献

利用序列组成成分同源性预测蛋白质-配体结合亲和力。

Protein-ligand binding affinity prediction exploiting sequence constituent homology.

机构信息

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom.

The National Institute of Agricultural Botany, Cambridge CB3 0LE, United Kingdom.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad502.

Abstract

MOTIVATION

Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand.

RESULTS

The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset.

AVAILABILITY AND IMPLEMENTATION

Code and data uploaded to https://github.com/abbiAR/PLBAffinity.

摘要

动机

分子对接是一种常用于估计结合构象及其结合亲和力的方法。机器学习已成功用于增强这种亲和力估计。已经开发了许多具有不同复杂性的方法,利用这些结构中提供的一些或所有空间和分类信息。这些方法的评估主要使用来自 PDBbind 的数据集进行,特别是比较评分函数评估 (CASF) 2007、2013 和 2016 数据集以及专用测试集。这项工作表明,对于这些复合物,不需要知道配体的确切结合构象,仅使用少量简单描述符就可以有效地估计结合亲和力。

结果

本文提出的使用少量配体和蛋白质描述符与梯度提升树相结合的方法在 CASF 数据集上表现出了很高的性能。这包括常用的基准 CASF2016,在该基准中,它的表现似乎优于任何其他方法。该方法也可用于配体和蛋白质之间空间关系未知的数据集,如使用大型 CHEMBL 衍生数据集所证明的那样。

可用性和实现

代码和数据已上传至 https://github.com/abbiAR/PLBAffinity。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验