Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, GA 30318, USA.
J Struct Biol. 2011 Mar;173(3):558-69. doi: 10.1016/j.jsb.2010.09.009. Epub 2010 Sep 17.
Exhaustive exploration of molecular interactions at the level of complete proteomes requires efficient and reliable computational approaches to protein function inference. Ligand docking and ranking techniques show considerable promise in their ability to quantify the interactions between proteins and small molecules. Despite the advances in the development of docking approaches and scoring functions, the genome-wide application of many ligand docking/screening algorithms is limited by the quality of the binding sites in theoretical receptor models constructed by protein structure prediction. In this study, we describe a new template-based method for the local refinement of ligand-binding regions in protein models using remotely related templates identified by threading. We designed a Support Vector Regression (SVR) model that selects correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for the specific chemical environment within a binding site and optimize the interactions with putative ligands. The SVR score is well correlated with the RMSD from the native structure; in 47% (70%) of the cases, the Pearson's correlation coefficient is >0.5 (>0.3). When applied to weakly homologous models, the average heavy atom, local RMSD from the native structure of the top-ranked (best of top five) binding site geometries is 3.1Å (2.9Å) for roughly half of the targets; this represents a 0.1 (0.3)Å average improvement over the original predicted structure. Focusing on the subset of strongly conserved residues, the average heavy atom RMSD is 2.6Å (2.3Å). Furthermore, we estimate the upper bound of template-based binding site refinement using only weakly related proteins to be ∼2.6Å RMSD. This value also corresponds to the plasticity of the ligand-binding regions in distant homologues. The Binding Site Refinement (BSR) approach is available to the scientific community as a web server that can be accessed at http://cssb.biology.gatech.edu/bsr/.
全面探索完整蛋白质组水平的分子相互作用需要有效的、可靠的计算方法来进行蛋白质功能推断。配体对接和排序技术在定量蛋白质与小分子之间的相互作用方面显示出很大的潜力。尽管对接方法和评分函数的发展取得了进展,但由于通过蛋白质结构预测构建的理论受体模型中的结合位点质量,许多配体对接/筛选算法的全基因组应用受到限制。在这项研究中,我们描述了一种新的基于模板的方法,用于使用通过线程识别的远程相关模板来局部细化蛋白质模型中的配体结合区域。我们设计了一个支持向量回归(SVR)模型,该模型在大量多个受体构象的集合中选择正确的结合位点几何形状。SVR 模型采用了几个评分函数,对 Cα 位置施加几何约束,考虑结合位点内的特定化学环境,并优化与假定配体的相互作用。SVR 得分与从天然结构的 RMSD 高度相关;在 47%(70%)的情况下,皮尔逊相关系数>0.5(>0.3)。当应用于弱同源模型时,对于大约一半的目标,排名最高(前 5 名中最好的)结合位点几何形状的重原子和与天然结构的局部 RMSD 的平均值分别为 3.1Å(2.9Å);与原始预测结构相比,这代表了 0.1(0.3)Å 的平均改进。专注于强保守残基的子集,重原子 RMSD 的平均值为 2.6Å(2.3Å)。此外,我们估计仅使用弱相关蛋白进行基于模板的结合位点细化的上限为 2.6Å RMSD。该值也对应于远同源物中配体结合区域的可塑性。Binding Site Refinement (BSR) 方法可作为一个网络服务器供科学界使用,可在 http://cssb.biology.gatech.edu/bsr/ 访问。