Sheinerman Felix B, Al-Lazikani Bissan, Honig Barry
Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA.
J Mol Biol. 2003 Dec 5;334(4):823-41. doi: 10.1016/j.jmb.2003.09.075.
Here, we present an approach for the prediction of binding preferences of members of a large protein family for which structural information for a number of family members bound to a substrate is available. The approach involves a number of steps. First, an accurate multiple alignment of sequences of all members of a protein family is constructed on the basis of a multiple structural superposition of family members with known structure. Second, the methods of continuum electrostatics are used to characterize the energetic contribution of each residue in a protein to the binding of its substrate. Residues that make a significant contribution are mapped onto the protein sequence and are used to define a "binding site signature" for the complex being considered. Third, sequences whose structures have not been determined are checked to see if they have binding-site signatures similar to one of the known complexes. Predictions of binding affinity to a given substrate are based on similarities in binding-site signature. An important component of the approach is the introduction of a context-specific substitution matrix suitable for comparison of binding-site residues. The methods are applied to the prediction of phosphopeptide selectivity of SH2 domains. To this end, the energetic roles of all protein residues in 17 different complexes of SH2 domains with their cognate targets are analyzed. The total number of residues that make significant contributions to binding is found to vary from nine to 19 in different complexes. These energetically important residues are found to contribute to binding through a variety of mechanisms, involving both electrostatic and hydrophobic interactions. Binding-site signatures are found to involve residues in different positions in SH2 sequences, some of them as far as 9A away from a bound peptide. Surprisingly, similarities in the signatures of different domains do not correlate with whole-domain sequence identities unless the latter is greater than 50%. An extensive comparison with the optimal binding motifs determined by peptide library experiments, as well as other experimental data indicate that the similarity in binding preferences of different SH2 domains can be deduced on the basis of their binding-site signatures. The analysis provides a rationale for the empirically derived classification of SH2 domains described by Songyang & Cantley, in that proteins in the same group are found to have similar residues at positions important for binding. Confident predictions of binding preference can be made for about 85% of SH2 domain sequences found in SWISSPROT. The approach described in this work is quite general and can, in principle, be used to analyze binding preferences of members of large protein families for which structural information for a number of family members is available. It also offers a strategy for predicting cross-reactivity of compounds designed to bind to a particular target, for example in structure-based drug design.
在此,我们提出一种方法,用于预测一个大型蛋白质家族成员的结合偏好,该家族中有多个成员与底物结合的结构信息是已知的。该方法包括多个步骤。首先,基于具有已知结构的家族成员的多重结构叠加,构建蛋白质家族所有成员序列的精确多重比对。其次,使用连续介质静电学方法来表征蛋白质中每个残基对其底物结合的能量贡献。对结合有显著贡献的残基被映射到蛋白质序列上,并用于定义所考虑复合物的“结合位点特征”。第三,检查结构尚未确定的序列,看它们是否具有与已知复合物之一相似的结合位点特征。对给定底物的结合亲和力预测基于结合位点特征的相似性。该方法的一个重要组成部分是引入适合比较结合位点残基的上下文特异性替换矩阵。这些方法被应用于预测SH2结构域的磷酸肽选择性。为此,分析了SH2结构域与其同源靶标的17种不同复合物中所有蛋白质残基的能量作用。发现对结合有显著贡献的残基总数在不同复合物中从9个到19个不等。发现这些在能量上重要的残基通过多种机制对结合有贡献,包括静电和疏水相互作用。发现结合位点特征涉及SH2序列中不同位置的残基,其中一些残基距离结合肽最远达9埃。令人惊讶的是,不同结构域特征的相似性与整个结构域序列同一性不相关,除非后者大于50%。与通过肽库实验确定的最佳结合基序以及其他实验数据进行的广泛比较表明,不同SH2结构域结合偏好的相似性可以根据它们的结合位点特征推导出来。该分析为Songyang和Cantley描述的SH2结构域的经验性分类提供了理论依据,因为发现同一组中的蛋白质在对结合重要的位置具有相似的残基。对于SWISSPROT中发现的约85%的SH2结构域序列,可以做出可靠的结合偏好预测。本文所述方法非常通用,原则上可用于分析有多个家族成员结构信息的大型蛋白质家族成员的结合偏好。它还为预测设计用于结合特定靶标的化合物的交叉反应性提供了一种策略,例如在基于结构的药物设计中。