Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, 010021 Hohhot, China.
Amino Acids. 2010 Mar;38(3):859-67. doi: 10.1007/s00726-009-0292-1. Epub 2009 Apr 22.
Due to the complexity of Plasmodium falciparumis genome, predicting secretory proteins of P. falciparum is more difficult than other species. In this study, based on the measure of diversity definition, a new K-nearest neighbor method, K-minimum increment of diversity (K-MID), is introduced to predict secretory proteins. The prediction performance of the K-MID by using amino acids composition as the only input vector achieves 88.89% accuracy with 0.78 Mathew's correlation coefficient (MCC). Further, the several reduced amino acids alphabets are applied to predict secretory proteins and the results show that the prediction results are improved to 90.67% accuracy with 0.83 MCC by using the 169 dipeptide compositions of the reduced amino acids alphabets obtained from Protein Blocks method.
由于恶性疟原虫基因组的复杂性,预测恶性疟原虫的分泌蛋白比其他物种更困难。在这项研究中,基于多样性定义的度量,引入了一种新的 K-最近邻方法,K-最小多样性增量(K-MID),用于预测分泌蛋白。使用氨基酸组成作为唯一输入向量的 K-MID 的预测性能达到 88.89%的准确率,0.78 的马修斯相关系数(MCC)。进一步,应用几种简化的氨基酸字母表来预测分泌蛋白,结果表明,通过使用从 Protein Blocks 方法获得的简化氨基酸字母表的 169 种二肽组成,预测结果提高到 90.67%的准确率,0.83 的 MCC。