Coeytaux Karen, Poupon Anne
Yeast Structural Genomics, IBBMC, Université Paris-Sud, Orsay, France.
Bioinformatics. 2005 May 1;21(9):1891-900. doi: 10.1093/bioinformatics/bti266. Epub 2005 Jan 18.
Partially and wholly unstructured proteins have now been identified in all kingdoms of life--more commonly in eukaryotic organisms. This intrinsic disorder is related to certain critical functions. Apart from their fundamental interest, unstructured regions in proteins may prevent crystallization. Therefore, the prediction of disordered regions is an important aspect for the understanding of protein function, but may also help to devise genetic constructs.
In this paper we present a computational tool for the detection of unstructured regions in proteins based on two properties of unfolded fragments: (1) disordered regions have a biased composition and (2) they usually contain either small or no hydrophobic clusters. In order to quantify these two facts we first calculate the amino acid distributions in structured and unstructured regions. Using this distribution, we calculate for a given sequence fragment the probability to be part of either a structured or an unstructured region. For each amino acid, the distance to the nearest hydrophobic cluster is also computed. Using these three values along a protein sequence allows us to predict unstructured regions, with very simple rules. This method requires only the primary sequence, and no multiple alignment, which makes it an adequate method for orphan proteins.
目前已在所有生命王国中鉴定出部分和完全无结构的蛋白质,在真核生物中更为常见。这种内在无序与某些关键功能相关。除了其基本的研究意义外,蛋白质中的无结构区域可能会阻止结晶。因此,预测无序区域对于理解蛋白质功能是一个重要方面,而且可能有助于设计基因构建体。
在本文中,我们基于未折叠片段的两个特性提出了一种用于检测蛋白质中无结构区域的计算工具:(1)无序区域具有偏向性的组成,(2)它们通常包含很少或不包含疏水簇。为了量化这两个事实,我们首先计算结构化和无结构区域中的氨基酸分布。利用这种分布,我们为给定的序列片段计算其属于结构化或无结构区域的概率。对于每个氨基酸,还计算其到最近疏水簇的距离。沿着蛋白质序列使用这三个值使我们能够通过非常简单的规则预测无结构区域。该方法仅需要一级序列,无需多序列比对,这使其成为适用于孤儿蛋白的方法。