Pham Tuan A, Jain Ajay N
University of California, San Francisco, Box 0128, San Francisco, CA 94143-0128, USA.
J Comput Aided Mol Des. 2008 May;22(5):269-86. doi: 10.1007/s10822-008-9174-y. Epub 2008 Feb 14.
Empirical scoring functions used in protein-ligand docking calculations are typically trained on a dataset of complexes with known affinities with the aim of generalizing across different docking applications. We report a novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function's training towards a particular application, such as screening enrichment. The approach combines multiple instance learning, positive data in the form of ligands of protein binding sites of known and unknown affinity and binding geometry, and negative (decoy) data of ligands thought not to bind particular protein binding sites or known not to bind in particular geometries. Performance of the method for the Surflex-Dock scoring function is shown in cross-validation studies and in eight blind test cases. Tuned functions optimized with a sufficient amount of data exhibited either improved or undiminished screening performance relative to the original function across all eight complexes. Analysis of the changes to the scoring function suggest that modifications can be learned that are related to protein-specific features such as active-site mobility.
用于蛋白质-配体对接计算的经验评分函数通常在具有已知亲和力的复合物数据集上进行训练,目的是在不同的对接应用中进行推广。我们报告了一种新的评分函数优化方法,该方法支持使用额外信息来约束评分函数参数,可用于将评分函数的训练聚焦于特定应用,如筛选富集。该方法结合了多实例学习、已知和未知亲和力及结合几何结构的蛋白质结合位点配体形式的正数据,以及被认为不结合特定蛋白质结合位点或已知不在特定几何结构中结合的配体的负(诱饵)数据。在交叉验证研究和八个盲测案例中展示了该方法对Surflex-Dock评分函数的性能。相对于原始函数,用足够数量的数据优化后的调整函数在所有八个复合物上均表现出改进或未降低的筛选性能。对评分函数变化的分析表明,可以学习到与蛋白质特异性特征(如活性位点流动性)相关的修改。