Vucetic Slobodan, Brown Celeste J, Dunker A Keith, Obradovic Zoran
Center for Information Science and Technology, Temple University, Philadelphia, Pennsylvania 19122, USA.
Proteins. 2003 Sep 1;52(4):573-84. doi: 10.1002/prot.10437.
Intrinsically disordered proteins are characterized by long regions lacking 3-D structure in their native states, yet they have been so far associated with 28 distinguishable functions. Previous studies showed that protein predictors trained on disorder from one type of protein often achieve poor accuracy on disorder of proteins of a different type, thus indicating significant differences in sequence properties among disordered proteins. Important biological problems are identifying different types, or flavors, of disorder and examining their relationships with protein function. Innovative use of computational methods is needed in addressing these problems due to relative scarcity of experimental data and background knowledge related to protein disorder. We developed an algorithm that partitions protein disorder into flavors based on competition among increasing numbers of predictors, with prediction accuracy determining both the number of distinct predictors and the partitioning of the individual proteins. Using 145 variously characterized proteins with long (>30 amino acids) disordered regions, 3 flavors, called V, C, and S, were identified by this approach, with the V subset containing 52 segments and 7743 residues, C containing 39 segments and 3402 residues, and S containing 54 segments and 5752 residues. The V, C, and S flavors were distinguishable by amino acid compositions, sequence locations, and biological function. For the sequences in SwissProt and 28 genomes, their protein functions exhibit correlations with the commonness and usage of different disorder flavors, suggesting different flavor-function sets across these protein groups. Overall, the results herein support the flavor-function approach as a useful complement to structural genomics as a means for automatically assigning possible functions to sequences.
内在无序蛋白质的特征是在其天然状态下存在缺乏三维结构的长区域,但到目前为止它们已与28种可区分的功能相关联。先前的研究表明,在一种类型蛋白质的无序区域上训练的蛋白质预测器,在预测另一种类型蛋白质的无序区域时,准确性往往较差,这表明无序蛋白质之间在序列特性上存在显著差异。重要的生物学问题是识别不同类型或“风味”的无序区域,并研究它们与蛋白质功能的关系。由于与蛋白质无序相关的实验数据和背景知识相对匮乏,因此需要创新地使用计算方法来解决这些问题。我们开发了一种算法,该算法基于越来越多的预测器之间的竞争将蛋白质无序区域划分为不同的“风味”,预测准确性决定了不同预测器的数量以及单个蛋白质的划分。使用145个具有长(>30个氨基酸)无序区域且特征各异的蛋白质,通过这种方法识别出了3种“风味”,称为V、C和S,其中V子集包含52个片段和7743个残基,C包含39个片段和3402个残基,S包含54个片段和5752个残基。V、C和S这三种“风味”可以通过氨基酸组成、序列位置和生物学功能来区分。对于SwissProt和28个基因组中的序列,它们的蛋白质功能与不同无序“风味”的普遍性和使用情况存在相关性,这表明这些蛋白质组中存在不同的“风味”-功能集。总体而言,本文的结果支持“风味”-功能方法,作为结构基因组学的一种有用补充,可作为自动为序列分配可能功能的一种手段。