College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.
Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.
BMC Genomics. 2018 Jun 19;19(1):478. doi: 10.1186/s12864-018-4849-9.
Apoptosis is associated with some human diseases, including cancer, autoimmune disease, neurodegenerative disease and ischemic damage, etc. Apoptosis proteins subcellular localization information is very important for understanding the mechanism of programmed cell death and the development of drugs. Therefore, the prediction of subcellular localization of apoptosis protein is still a challenging task.
In this paper, we propose a novel method for predicting apoptosis protein subcellular localization, called PsePSSM-DCCA-LFDA. Firstly, the protein sequences are extracted by combining pseudo-position specific scoring matrix (PsePSSM) and detrended cross-correlation analysis coefficient (DCCA coefficient), then the extracted feature information is reduced dimensionality by LFDA (local Fisher discriminant analysis). Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of the apoptosis proteins. The overall prediction accuracy of 99.7, 99.6 and 100% are achieved respectively on the three benchmark datasets by the most rigorous jackknife test, which is better than other state-of-the-art methods.
The experimental results indicate that our method can significantly improve the prediction accuracy of subcellular localization of apoptosis proteins, which is quite high to be able to become a promising tool for further proteomics studies. The source code and all datasets are available at https://github.com/QUST-BSBRC/PsePSSM-DCCA-LFDA/ .
细胞凋亡与一些人类疾病有关,包括癌症、自身免疫性疾病、神经退行性疾病和缺血性损伤等。凋亡蛋白的亚细胞定位信息对于理解细胞程序性死亡的机制和药物的开发非常重要。因此,凋亡蛋白亚细胞定位的预测仍然是一项具有挑战性的任务。
在本文中,我们提出了一种新的预测凋亡蛋白亚细胞定位的方法,称为 PsePSSM-DCCA-LFDA。首先,通过结合伪位置特异性评分矩阵(PsePSSM)和去趋势交叉相关分析系数(DCCA 系数)提取蛋白质序列,然后通过局部 Fisher 判别分析(LFDA)降低提取的特征信息的维数。最后,将最优特征向量输入 SVM 分类器以预测凋亡蛋白的亚细胞位置。通过最严格的 Jackknife 测试,在三个基准数据集上分别获得了 99.7%、99.6%和 100%的整体预测精度,优于其他最先进的方法。
实验结果表明,我们的方法可以显著提高凋亡蛋白亚细胞定位的预测精度,这是相当高的,有望成为进一步蛋白质组学研究的有前途的工具。源代码和所有数据集可在 https://github.com/QUST-BSBRC/PsePSSM-DCCA-LFDA/ 上获得。