College of Chemistry, Sichuan University, Chengdu 610064, PR China.
Comput Biol Chem. 2012 Feb;36:31-5. doi: 10.1016/j.compbiolchem.2011.12.001. Epub 2011 Dec 30.
Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides.
信号肽在各种生物过程中起着至关重要的作用,例如细胞表面受体的定位、分泌蛋白的易位和细胞间通讯。然而,信号肽中的氨基酸突变,也称为非同义单核苷酸多态性 (nsSNP 或 SAPs),可能导致其功能丧失。在本研究中,提出了一种基于随机森林 (RF) 的计算方法,通过结合位置特异性评分矩阵 (PSSM) 谱、SignalP 评分和物理化学性质来预测信号肽中的有害 nsSNP。这些特征通过最大相关性最小冗余 (mRMR) 方法进行优化。然后,使用代价矩阵来最小化通常在 nsSNP 预测中出现的不平衡数据分类问题的影响。通过 Jackknife 测试,当最优子集包含 10 个特征时,该方法的整体准确率为 84.5%,ROC 曲线下的面积 (AUC) 为 0.822。此外,在相同的数据集上,我们将我们的预测器与其他现有的方法进行了比较,包括基于 R 分数的方法和基于 D 分数的方法,我们的方法的结果优于这两种方法。令人满意的性能表明,我们的方法在预测信号肽中的有害 nsSNP 方面是有效的。