Ofran Yanay, Rost Burkhard
CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
J Mol Biol. 2003 Jan 10;325(2):377-87. doi: 10.1016/s0022-2836(02)01223-8.
Non-covalent residue side-chain interactions occur in many different types of proteins and facilitate many biological functions. Are these differences manifested in the sequence compositions and/or the residue-residue contact preferences of the interfaces? Previous studies analysed small data sets and gave contradictory answers. Here, we introduced a new data-mining method that yielded the largest high-resolution data set of interactions analysed. We introduced an information theory-based analysis method. On the basis of sequence features, we were able to differentiate six types of protein interfaces, each corresponding to a different functional or structural association between residues. Particularly, we found significant differences in amino acid composition and residue-residue preferences between interactions of residues within the same structural domain and between different domains, between permanent and transient interfaces, and between interactions associating homo-oligomers and hetero-oligomers. The differences between the six types were so substantial that, using amino acid composition alone, we could predict statistically to which of the six types of interfaces a pool of 1000 residues belongs at 63-100% accuracy. All interfaces differed significantly from the background of all residues in SWISS-PROT, from the group of surface residues, and from internal residues that were not involved in non-trivial interactions. Overall, our results suggest that the interface type could be predicted from sequence and that interface-type specific mean-field potentials may be adequate for certain applications.
非共价残基侧链相互作用存在于许多不同类型的蛋白质中,并促进多种生物学功能。这些差异是否体现在界面的序列组成和/或残基-残基接触偏好上?先前的研究分析了小数据集并给出了相互矛盾的答案。在这里,我们引入了一种新的数据挖掘方法,该方法产生了已分析的最大的高分辨率相互作用数据集。我们引入了一种基于信息论的分析方法。基于序列特征,我们能够区分六种类型的蛋白质界面,每种界面对应于残基之间不同的功能或结构关联。特别地,我们发现同一结构域内残基之间的相互作用与不同结构域之间的相互作用、永久界面与瞬时界面之间的相互作用以及同寡聚体与异寡聚体相关的相互作用之间,在氨基酸组成和残基-残基偏好上存在显著差异。这六种类型之间的差异非常大,仅使用氨基酸组成,我们就能够以63%-100%的准确率从统计学上预测1000个残基组成的集合属于六种界面类型中的哪一种。所有界面与SWISS-PROT中所有残基的背景、表面残基组以及未参与重要相互作用的内部残基均存在显著差异。总体而言,我们的结果表明,可以从序列预测界面类型,并且界面类型特定的平均场势可能适用于某些应用。