Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.
Biochemistry (Mosc). 2010 Feb;75(2):192-200. doi: 10.1134/s0006297910020094.
We have performed a statistical analysis of unstructured amino acid residues in protein structures available in the databank of protein structures. Data on the occurrence of disordered regions at the ends and in the middle part of protein chains have been obtained: in the regions near the ends (at distance less than 30 residues from the N- or C-terminus), there are 66% of unstructured residues (38% are near the N-terminus and 28% are near the C-terminus), although these terminal regions include only 23% of the amino acid residues. The frequencies of occurrence of unstructured residues have been calculated for each of 20 types in different positions in the protein chain. It has been shown that relative frequencies of occurrence of unstructured residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain; amino acid residues of the same type have different probabilities to be unstructured in the terminal regions and in the middle part of the protein chain. The obtained frequencies of occurrence of unstructured residues in the middle part of the protein chain have been used as a scale for predicting disordered regions from amino acid sequence using the method (FoldUnfold) previously developed by us. This scale of frequencies of occurrence of unstructured residues correlates with the contact scale (previously developed by us and used for the same purpose) at a level of 95%. Testing the new scale on a database of 427 unstructured proteins and 559 completely structured proteins has shown that this scale can be successfully used for the prediction of disordered regions in protein chains.
我们对蛋白质结构数据库中可用的蛋白质结构的无规残基进行了统计分析。获得了蛋白质链末端和中间部分无序区域出现的数据:在靠近末端的区域(距离 N 端或 C 端小于 30 个残基处),有 66%的无规残基(38%在 N 端附近,28%在 C 端附近),尽管这些末端区域仅包含 23%的氨基酸残基。对于蛋白质链中不同位置的 20 种类型,计算了无规残基的出现频率。结果表明,蛋白质链末端 20 种类型的无规残基的相对出现频率与蛋白质链中部的不同;在末端区域和蛋白质链中部,相同类型的氨基酸残基具有不同的无规化概率。使用我们之前开发的方法(FoldUnfold),根据蛋白质链中部无规残基的出现频率,预测无规区域。该无规残基出现频率的尺度与我们之前开发的接触尺度(用于相同目的)相关联,关联度为 95%。在 427 个无规蛋白质和 559 个完全结构蛋白质的数据库上测试新尺度表明,该尺度可成功用于预测蛋白质链中的无规区域。