Liu Taigang, Tao Peiying, Li Xiaowei, Qin Yufang, Wang Chunhua
College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
College of Food Science & Technology, Shanghai Ocean University, Shanghai 201306, China.
J Theor Biol. 2015 Feb 7;366:8-12. doi: 10.1016/j.jtbi.2014.11.010. Epub 2014 Nov 20.
Knowledge of apoptosis proteins plays an important role in understanding the mechanism of programmed cell death. Obtaining information on subcellular location of apoptosis proteins is very helpful to reveal the apoptosis mechanism and understand the function of apoptosis proteins. Because of the cost in time and labor associated with large-scale wet-bench experiments, computational prediction of apoptosis proteins subcellular location becomes very important and many computational tools have been developed in the recent decades. Existing methods differ in the protein sequence representation techniques and classification algorithms adopted. In this study, we firstly introduce a sequence encoding scheme based on tri-grams computed directly from position-specific score matrices, which incorporates evolution information represented in the PSI-BLAST profile and sequence-order information. Then SVM-RFE algorithm is applied for feature selection and reduced vectors are input to a support vector machine classifier to predict subcellular location of apoptosis proteins. Jackknife tests on three widely used datasets show that our method provides the state-of-the-art performance in comparison with other existing methods.
了解凋亡蛋白在理解程序性细胞死亡机制中起着重要作用。获取凋亡蛋白亚细胞定位的信息对于揭示凋亡机制和理解凋亡蛋白的功能非常有帮助。由于大规模湿实验相关的时间和劳动力成本,凋亡蛋白亚细胞定位的计算预测变得非常重要,并且在最近几十年中已经开发了许多计算工具。现有方法在采用的蛋白质序列表示技术和分类算法方面有所不同。在本研究中,我们首先介绍一种基于直接从位置特异性得分矩阵计算的三元组的序列编码方案,该方案结合了PSI-BLAST谱中表示的进化信息和序列顺序信息。然后应用SVM-RFE算法进行特征选择,并将降维后的向量输入支持向量机分类器以预测凋亡蛋白的亚细胞定位。在三个广泛使用的数据集上进行的留一法测试表明,与其他现有方法相比,我们的方法具有最先进的性能。