Ding Shuyan, Zhang Shengli
Department of Sciences, Dalian Nationalities University, Dalian 116600, China.
School of Mathematics and Statistics, Xidian University, Xi'an 710071, China.
Biomed Res Int. 2016;2016:3206741. doi: 10.1155/2016/3206741. Epub 2016 Aug 2.
Prediction of secreted protein types based solely on sequence data remains to be a challenging problem. In this study, we extract the long-range correlation information and linear correlation information from position-specific score matrix (PSSM). A total of 6800 features are extracted at 17 different gaps; then, 309 features are selected by a filter feature selection method based on the training set. To verify the performance of our method, jackknife and independent dataset tests are performed on the test set and the reported overall accuracies are 93.60% and 100%, respectively. Comparison of our results with the existing method shows that our method provides the favorable performance for secreted protein type prediction.
仅基于序列数据预测分泌蛋白类型仍然是一个具有挑战性的问题。在本研究中,我们从位置特异性得分矩阵(PSSM)中提取长程相关信息和线性相关信息。在17个不同的间隔处共提取了6800个特征;然后,基于训练集通过过滤特征选择方法选择了309个特征。为了验证我们方法的性能,在测试集上进行了留一法和独立数据集测试,报告的总体准确率分别为93.60%和100%。将我们的结果与现有方法进行比较表明,我们的方法在分泌蛋白类型预测方面具有良好的性能。