Liao Bo, Jiang Jun-Bao, Zeng Qing-Guang, Zhu Wen
School of computer and communication, Hunan University, Changsha Hunan, 410082, China. dragonbw @163.com
Protein Pept Lett. 2011 Nov;18(11):1086-92. doi: 10.2174/092986611797200931.
The function of the protein is closely correlated with its subcellular localization. Probing into the mechanism of protein sorting and predicting protein subcellular location can provide important clues or insights for understanding the function of proteins. In this paper, we introduce a new PseAAC approach to encode the protein sequence based on the physicochemical properties of amino acid residues. Each of the protein samples was defined as a 146D (dimensional) vector including the 20 amino acid composition components and 126 adjacent triune residues contents. To evaluate the effectiveness of this encoding scheme, we did jackknife tests on three datasets using the support vector machine algorithm. The total prediction accuracies are 84.9%, 91.2%, and 92.6%, respectively. The satisfactory results indicate that our method could be a useful tool in the area of bioinformatics and proteomics.
蛋白质的功能与其亚细胞定位密切相关。探究蛋白质分选机制并预测蛋白质亚细胞定位可为理解蛋白质功能提供重要线索或见解。在本文中,我们基于氨基酸残基的物理化学性质,引入了一种新的伪氨基酸组成(PseAAC)方法来编码蛋白质序列。每个蛋白质样本被定义为一个146维向量,包括20种氨基酸组成成分和126个相邻三联体残基含量。为评估这种编码方案的有效性,我们使用支持向量机算法在三个数据集上进行了留一法测试。总预测准确率分别为84.9%、91.2%和92.6%。令人满意的结果表明,我们的方法可能是生物信息学和蛋白质组学领域的一种有用工具。