College of Information Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China.
Biomed Res Int. 2019 Jan 30;2019:2436924. doi: 10.1155/2019/2436924. eCollection 2019.
The prediction of apoptosis protein subcellular localization plays an important role in understanding the progress in cell proliferation and death. Recently computational approaches to this issue have become very popular, since the traditional biological experiments are so costly and time-consuming that they cannot catch up with the growth rate of sequence data anymore. In order to improve the prediction accuracy of apoptosis protein subcellular localization, we proposed a sparse coding method combined with traditional feature extraction algorithm to complete the sparse representation of apoptosis protein sequences, using multilayer pooling based on different sizes of dictionaries to integrate the processed features, as well as oversampling approach to decrease the influences caused by unbalanced data sets. Then the extracted features were input to a support vector machine to predict the subcellular localization of the apoptosis protein. The experiment results obtained by Jackknife test on two benchmark data sets indicate that our method can significantly improve the accuracy of the apoptosis protein subcellular localization prediction.
凋亡蛋白亚细胞定位的预测在理解细胞增殖和死亡的进程中起着重要作用。由于传统的生物学实验成本高、耗时耗力,已经跟不上序列数据的增长速度,因此,最近针对这一问题的计算方法变得非常流行。为了提高凋亡蛋白亚细胞定位的预测准确性,我们提出了一种稀疏编码方法,结合传统的特征提取算法来完成凋亡蛋白序列的稀疏表示,使用基于不同大小字典的多层池化来整合处理后的特征,以及过采样方法来减少不平衡数据集所造成的影响。然后将提取的特征输入支持向量机以预测凋亡蛋白的亚细胞定位。在两个基准数据集上进行的 Jackknife 测试的实验结果表明,我们的方法可以显著提高凋亡蛋白亚细胞定位预测的准确性。