a School of Science , Xi'an Polytechnic University , Xi'an 710048 , PR China.
b School of Mathematics and Statistics , Xidian University , Xi'an 710071 , PR China.
SAR QSAR Environ Res. 2018 Jun;29(6):469-481. doi: 10.1080/1062936X.2018.1459835. Epub 2018 Apr 24.
Gram-negative bacterial secreted proteins play different roles in invaded eukaryotic cells and cause various diseases. Prediction of Gram-negative bacterial secreted protein types is a meaningful and challenging task. In this paper, we develop a multiple statistical features extraction model based on the dipeptide composition (DPC) descriptor and the detrended moving-average auto-cross-correlation analysis (DMACA) descriptor by PSI-BLAST profile. A 610-dimensional feature vector was constructed on the training set, and the feature extraction model was denoted DPC-DMACA-PSSM. A support vector machine was then selected as a classifier, and the bias-free jackknife test method was used for evaluating the accuracy. Our predictor achieves favourable performance for overall accuracy on the test set and also outperforms the other published approaches. The results show that our approach offers a reliable tool for the identification of Gram-negative bacterial secreted protein types.
革兰氏阴性菌分泌蛋白在入侵真核细胞中发挥不同的作用,并导致各种疾病。预测革兰氏阴性菌分泌蛋白的类型是一项有意义且具有挑战性的任务。在本文中,我们开发了一种基于 PSI-BLAST 轮廓的二肽组成 (DPC) 描述符和去趋势移动平均自相关分析 (DMACA) 描述符的多重统计特征提取模型。在训练集上构建了一个 610 维特征向量,特征提取模型表示为 DPC-DMACA-PSSM。然后选择支持向量机作为分类器,并使用无偏叉验证方法评估准确性。我们的预测器在测试集上的整体准确性方面表现出良好的性能,并且优于其他已发表的方法。结果表明,我们的方法为革兰氏阴性菌分泌蛋白类型的识别提供了一种可靠的工具。