Wang Wei, Geng XingBo, Dou Yongchao, Liu Taigang, Zheng Xiaoqi
Department of Mathematics, Shanghai Normal University, Shanghai, China.
Protein Pept Lett. 2011 May;18(5):480-7. doi: 10.2174/092986611794927947.
Information of protein subcellular location plays an important role in molecular cell biology. Prediction of the subcellular location of proteins will help to understand their functions and interactions. In this paper, a different mode of pseudo amino acid composition was proposed to represent protein samples for predicting their subcellular localization via the following procedures: based on the optimal splice site of each protein sequence, we divided a sequence into sorting signal part and mature protein part, and extracted sequence features from each part separately. Then, the combined features were fed into the SVM classifier to perform the prediction. By the jackknife test on a benchmark dataset in which none of proteins included has more than 90% pairwise sequence identity to any other, the overall accuracies achieved by the method are 94.5% and 90.3% for prokaryotic and eukaryotic proteins, respectively. The results indicate that the prediction quality by our method is quite satisfactory. It is anticipated that the current method may serve as an alternative approach to the existing prediction methods.
蛋白质亚细胞定位信息在分子细胞生物学中起着重要作用。预测蛋白质的亚细胞定位将有助于理解其功能和相互作用。本文提出了一种不同模式的伪氨基酸组成来表示蛋白质样本,以便通过以下步骤预测其亚细胞定位:基于每个蛋白质序列的最佳剪接位点,我们将序列分为分选信号部分和成熟蛋白质部分,并分别从每个部分提取序列特征。然后,将组合特征输入支持向量机分类器进行预测。通过对一个基准数据集进行留一法测试(该数据集中没有任何蛋白质与其他任何蛋白质的成对序列同一性超过90%),该方法对原核生物和真核生物蛋白质的总体准确率分别达到了94.5%和90.3%。结果表明我们方法的预测质量相当令人满意。预计当前方法可作为现有预测方法的替代方法。