de Brevern Alexandre G, Valadié Hélène, Hazout Serge, Etchebest Catherine
Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM U436, Université Denis DIDEROT-Paris 7, France.
Protein Sci. 2002 Dec;11(12):2871-86. doi: 10.1110/ps.0220502.
Protein Blocks (PBs) comprise a structural alphabet of 16 protein fragments, each 5 Calpha long. They make it possible to approximate and correctly predict local protein three-dimensional (3D) structures. We have selected the 72 most frequent sequences of five PBs, which we call Structural Words (SWs). Analysis of four different protein data banks shows that SWs cover 92% of the amino acids in them and provide a good structural approximation for residues (i.e., sequences) 9 Calpha long. We present most of them in a simple network that describes 90% of the overall residues and, interestingly, includes more than 80% of the amino acids present in coils. Analysis of the network shows the specificity and quality of the 3D descriptions as well as a new type of relation between local folds and amino acid distribution. The results show that the 3D structure of these protein data banks can be easily described by a combination of subgraphs included in the network. Finally, a Bayesian probabilistic approach improved the prediction rate by 4%.
蛋白质模块(PBs)由16个蛋白质片段组成一个结构字母表,每个片段长5个α碳原子。它们使得近似并正确预测局部蛋白质三维(3D)结构成为可能。我们选择了五个PBs中72个最常见的序列,我们称之为结构词(SWs)。对四个不同蛋白质数据库的分析表明,SWs覆盖了其中92%的氨基酸,并为9个α碳原子长的残基(即序列)提供了良好的结构近似。我们将它们中的大多数呈现在一个简单的网络中,该网络描述了90%的总残基,有趣的是,包括了螺旋中80%以上的氨基酸。对该网络的分析显示了3D描述的特异性和质量,以及局部折叠与氨基酸分布之间的一种新型关系。结果表明,这些蛋白质数据库的3D结构可以通过网络中包含的子图组合轻松描述。最后,一种贝叶斯概率方法将预测率提高了4%。