Launay Guillaume, Simonson Thomas
Laboratoire de Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, 91128, Palaiseau, France.
BMC Bioinformatics. 2008 Oct 9;9:427. doi: 10.1186/1471-2105-9-427.
Structure-based computational methods are needed to help identify and characterize protein-protein complexes and their function. For individual proteins, the most successful technique is homology modelling. We investigate a simple extension of this technique to protein-protein complexes. We consider a large set of complexes of known structures, involving pairs of single-domain proteins. The complexes are compared with each other to establish their sequence and structural similarities and the relation between the two. Compared to earlier studies, a simpler dataset, a simpler structural alignment procedure, and an additional energy criterion are used. Next, we compare the Xray structures to models obtained by threading the native sequence onto other, homologous complexes. An elementary requirement for a successful energy function is to rank the native structure above any threaded structure. We use the DFIREbeta energy function, whose quality and complexity are typical of the models used today. Finally, we compare near-native models to distinctly non-native models.
If weakly stable complexes are excluded (defined by a binding energy cutoff), as well as a few unusual complexes, a simple homology principle holds: complexes that share more than 35% sequence identity share similar structures and interaction modes; this principle was less clearcut in earlier studies. The energy function was then tested for its ability to identify experimental structures among sets of decoys, produced by a simple threading procedure. On average, the experimental structure is ranked above 92% of the alternate structures. Thus, discrimination of the native structure is good but not perfect. The discrimination of near-native structures is fair. Typically, a single, alternate, non-native binding mode exists that has a native-like energy. Some of the associated failures may correspond to genuine, alternate binding modes and/or native complexes that are artefacts of the crystal environment. In other cases, additional model filtering with more sophisticated tools is needed.
The results suggest that the simple modelling procedure applied here could help identify and characterize protein-protein complexes. The next step is to apply it on a genomic scale.
需要基于结构的计算方法来帮助识别和表征蛋白质 - 蛋白质复合物及其功能。对于单个蛋白质而言,最成功的技术是同源建模。我们研究了将该技术简单扩展到蛋白质 - 蛋白质复合物的方法。我们考虑了一大组已知结构的复合物,这些复合物涉及单结构域蛋白质对。将这些复合物相互比较,以确定它们的序列和结构相似性以及两者之间的关系。与早期研究相比,我们使用了更简单的数据集、更简单的结构比对程序以及额外的能量标准。接下来,我们将X射线结构与通过将天然序列穿线到其他同源复合物上获得的模型进行比较。成功的能量函数的一个基本要求是将天然结构的排名高于任何穿线结构。我们使用DFIREbeta能量函数,其质量和复杂性是当今使用的模型的典型代表。最后,我们将近天然模型与明显非天然模型进行比较。
如果排除弱稳定复合物(由结合能截止值定义)以及一些异常复合物,一个简单的同源原理成立:序列同一性超过35%的复合物具有相似的结构和相互作用模式;这一原理在早期研究中不太明确。然后测试能量函数识别由简单穿线程序产生的诱饵集中的实验结构的能力。平均而言,实验结构的排名高于92%的替代结构。因此,对天然结构的区分良好但并不完美。对近天然结构的区分尚可。通常,存在一种具有类似天然能量的单一替代非天然结合模式。一些相关的失败可能对应于真正的替代结合模式和/或作为晶体环境假象的天然复合物。在其他情况下,需要使用更复杂的工具进行额外的模型筛选。
结果表明,这里应用的简单建模程序有助于识别和表征蛋白质 - 蛋白质复合物。下一步是在基因组规模上应用它。