Institute of System Biology, Shanghai University, 99 Shangda Road, Shanghai, 200244, China.
J Biomol Struct Dyn. 2011 Apr;28(5):797-804. doi: 10.1080/07391102.2011.10508607.
The small ubiquitin-like modifier (SUMO) proteins are a kind of proteins that can be attached to a series of proteins. The sumoylation of protein is an important posttranslational modification. Thus, the prediction of the sumoylation site of a given protein is significant. Here we employed a combined method to perform this task. We predicted the sumoylation site of a protein by a two-staged procedure. At the first stage, whether a protein would be sumoylated was predicted; whereas at the second stage, the sumoylation sites of the protein were predicted if it was determined to be modified by SUMO at the first stage. At the first stage, we encoded a protein with protein families (PFAM) and trained the predictor with nearest network algorithm (NNA); at the second stage, we encoded nonapeptides (peptides that contain nine residues) of the protein containing the lysine residues, with Amino Acid Index, and trained the predictor with NNA. The predictor was tested by the k-fold cross-validation method. The highest accuracy of the second-staged predictor was 99.55% when 12 features were incorporated in the predictor. The corresponding Matthews Correlation Coefficient was 0.7952. These results indicate that the method is a promising tool to predict the sumoylation site of a protein. At last, the features used in the predictor are discussed. The software is available at request.
小泛素相关修饰物(SUMO)蛋白是一类可附着在一系列蛋白质上的蛋白质。蛋白质的 SUMO 化是一种重要的翻译后修饰。因此,预测给定蛋白质的 SUMO 化位点具有重要意义。在这里,我们采用了一种组合方法来完成这项任务。我们通过两阶段程序预测蛋白质的 SUMO 化位点。在第一阶段,预测蛋白质是否会被 SUMO 化;而在第二阶段,如果在第一阶段确定蛋白质被 SUMO 修饰,则预测蛋白质的 SUMO 化位点。在第一阶段,我们用蛋白质家族(PFAM)对蛋白质进行编码,并使用最近网络算法(NNA)训练预测器;在第二阶段,我们用包含赖氨酸残基的蛋白质的九肽(含有九个残基的肽),用氨基酸指数进行编码,并使用 NNA 训练预测器。预测器通过 k 折交叉验证方法进行测试。当预测器中包含 12 个特征时,第二阶段预测器的最高准确率为 99.55%。对应的马修斯相关系数为 0.7952。这些结果表明,该方法是预测蛋白质 SUMO 化位点的一种很有前途的工具。最后,讨论了预测器中使用的特征。有需要可以联系获取该软件。