Frades Itziar, Resjö Svante, Andreasson Erik
Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Alnarp, SE-230 53, Sweden.
BMC Bioinformatics. 2015 Jul 30;16(1):239. doi: 10.1186/s12859-015-0657-2.
How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase-substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets.
First the discriminative n-grams of five species from five different kingdom/phyla were identified. A signature with 5540 discriminative n-grams that could be found in other species from the same kingdoms/phyla was created. Using a test data set, the ability of the signature to classify species in their corresponding kingdom/phylum was confirmed using classification methods. Lastly, ortholog proteins among proteins with n-grams were identified in order to determine to what degree was the identity of the detected n-grams a property of phosphosites rather than a consequence of species-specific or kingdom/phylum-specific protein inventory. The motifs were grouped in clusters of equal physico-chemical nature and their distribution was similar between species in the same kingdom/phylum while clear differences were found among species of different kingdom/phylum. For example, the animal-specific top discriminative n-grams contained many basic amino acids and the plant-specific motifs were mainly acidic. Secondary structure prediction methods show that the discriminative n-grams in the majority of the cases lack from a regular secondary structure as on average they had 88% of random coil compared to 66% found in the phosphoproteins they were derived from.
The discriminative n-grams were able to classify organisms in their corresponding kingdom/phylum, they show different patterns among species of different kingdom/phylum and these regions can contribute to evolutionary divergence as they are in disordered regions that can evolve rapidly. The differences found possibly reflect group-specific differences in the kinomes of the different groups of species.
蛋白质磷酸化与界/门分化之间的关系在很大程度上尚不清楚,磷酸化位点周围的氨基酸残基对蛋白激酶-底物相互作用具有深远意义。标准基序分析不足以进行大规模比较分析,因为每个磷酸肽都被分配到一个独特的基序,并且在输入数据集不平衡的情况下表现不佳。
首先,识别了来自五个不同界/门的五个物种的判别性n元语法。创建了一个具有5540个判别性n元语法的特征,该特征可以在来自相同界/门的其他物种中找到。使用测试数据集,通过分类方法确认了该特征在其相应界/门中对物种进行分类的能力。最后,识别了具有n元语法的蛋白质之间的直系同源蛋白,以确定检测到的n元语法的一致性在多大程度上是磷酸化位点的特性,而不是物种特异性或界/门特异性蛋白质库的结果。这些基序被分组为具有相同物理化学性质的簇,它们在同一界/门的物种之间的分布相似,而在不同界/门的物种之间发现了明显差异。例如,动物特异性的顶级判别性n元语法包含许多碱性氨基酸,而植物特异性基序主要是酸性的。二级结构预测方法表明,在大多数情况下,判别性n元语法缺乏规则的二级结构,因为平均而言,它们有88%的无规卷曲,而它们所源自的磷酸化蛋白质中这一比例为66%。
判别性n元语法能够将生物体分类到其相应的界/门中,它们在不同界/门的物种之间表现出不同的模式,并且这些区域可以促进进化分化,因为它们位于可以快速进化的无序区域。发现的差异可能反映了不同物种组激酶组中的组特异性差异。