利用负训练数据对蛋白质-配体相互作用进行评分的参数估计

Parameter estimation for scoring protein-ligand interactions using negative training data.

作者信息

Pham Tuan A, Jain Ajay N

机构信息

Cancer Research Institute, Department of Biopharmaceutical Sciences, University of California, San Francisco, 2340 Sutter Street, San Francisco, California 94143-0128, USA.

出版信息

J Med Chem. 2006 Oct 5;49(20):5856-68. doi: 10.1021/jm050040j.

DOI:10.1021/jm050040j

PMID:17004701

Abstract

Surflex-Dock employs an empirically derived scoring function to rank putative protein-ligand interactions by flexible docking of small molecules to proteins of known structure. The scoring function employed by Surflex was developed purely on the basis of positive data, comprising noncovalent protein-ligand complexes with known binding affinities. Consequently, scoring function terms for improper interactions received little weight in parameter estimation, and an ad hoc scheme for avoiding protein-ligand interpenetration was adopted. We present a generalized method for incorporating synthetically generated negative training data, which allows for rigorous estimation of all scoring function parameters. Geometric docking accuracy remained excellent under the new parametrization. In addition, a test of screening utility covering a diverse set of 29 proteins and corresponding ligand sets showed improved performance. Maximal enrichment of true ligands over nonligands exceeded 20-fold in over 80% of cases, with enrichment of greater than 100-fold in over 50% of cases.

摘要

Surflex-Dock采用一种基于经验得出的评分函数，通过将小分子与已知结构的蛋白质进行柔性对接，对假定的蛋白质-配体相互作用进行排名。Surflex所采用的评分函数完全基于阳性数据开发，这些阳性数据包括具有已知结合亲和力的非共价蛋白质-配体复合物。因此，在参数估计中，针对不适当相互作用的评分函数项权重较小，并且采用了一种临时方案来避免蛋白质-配体相互渗透。我们提出了一种纳入合成生成的阴性训练数据的通用方法，该方法能够对所有评分函数参数进行严格估计。在新的参数化条件下，几何对接精度仍然非常出色。此外，一项涵盖29种不同蛋白质及相应配体集的筛选效用测试显示性能有所提升。在超过80%的案例中，真实配体相对于非配体的最大富集超过20倍，在超过50%的案例中富集大于100倍。