Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, 610054, Chengdu, China.
Mol Divers. 2010 Nov;14(4):667-71. doi: 10.1007/s11030-009-9205-1. Epub 2009 Nov 12.
Mycobacterium tuberculosis is the primary pathogen causing tuberculosis, which is one of the most prevalent infectious diseases. The subcellular location of mycobacterial proteins can provide essential clues for proteins function research and drug discovery. Therefore, it is highly desirable to develop a computational method for fast and reliable prediction of subcellular location of mycobacterial proteins. In this study, we developed a support vector machine (SVM) based method to predict subcellular location of mycobacterial proteins. A total of 444 non-redundant mycobacterial proteins were used to train and test proposed model by using jackknife cross validation. By selecting traditional pseudo amino acid composition (PseAAC) as parameters, the overall accuracy of 83.3% was achieved. Moreover, a feature selection technique was developed to find out an optimal amount of PseAAC for improving predictive performance. The optimal amount of PseAAC improved overall accuracy from 83.3 to 87.2%. In addition, the reduced amino acids in N-terminus and non N-terminus of proteins were combined in models for further improving predictive successful rate. As a result, the maximum overall accuracy of 91.2% was achieved with average accuracy of 79.7%. The proposed model provides highly useful information for further experimental research. The prediction model can be accessed free of charge at http://cobi.uestc.edu.cn/cobi/people/hlin/webserver.
结核分枝杆菌是引起结核病的主要病原体,结核病是最常见的传染病之一。分枝杆菌蛋白的亚细胞定位可以为蛋白质功能研究和药物发现提供重要线索。因此,开发一种快速可靠的预测分枝杆菌蛋白亚细胞定位的计算方法是非常理想的。在这项研究中,我们开发了一种基于支持向量机(SVM)的方法来预测分枝杆菌蛋白的亚细胞定位。通过使用 Jackknife 交叉验证,使用 444 个非冗余分枝杆菌蛋白来训练和测试所提出的模型。通过选择传统的伪氨基酸组成(PseAAC)作为参数,获得了 83.3%的总体准确性。此外,还开发了一种特征选择技术,以找到改善预测性能的最佳 PseAAC 数量。最佳 PseAAC 数量将总体准确性从 83.3%提高到了 87.2%。此外,还将蛋白质 N 端和非 N 端的减少氨基酸组合到模型中,以进一步提高预测成功率。结果,获得了 91.2%的最大总体准确性,平均准确性为 79.7%。该预测模型为进一步的实验研究提供了非常有用的信息。预测模型可在 http://cobi.uestc.edu.cn/cobi/people/hlin/webserver 上免费获取。