Key Laboratory of Green Chemistry and Technology, College of Chemistry, Ministry of Education, Sichuan University, Chengdu, 610064, China.
Amino Acids. 2011 Jan;40(1):239-48. doi: 10.1007/s00726-010-0639-7. Epub 2010 Jun 12.
RNA-protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew's correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Predict_RBP.rar .
RNA 与蛋白质的相互作用在各种生物过程中起着关键作用,例如 mRNA 加工、蛋白质合成、核糖体的组装和功能。在这项工作中,我们引入了一种基于支持向量机的计算方法,用于预测蛋白质中的 RNA 结合位点,该方法使用了来自氨基酸序列信息的多种特征,包括位置特异性评分矩阵 (PSSM) 谱、理化性质和预测的溶剂可及性。考虑到氨基酸周围残基的影响和来自相邻氨基酸的依赖效应,使用滑动窗口和平滑窗口对 PSSM 谱进行编码。在 77 个 RNA 结合蛋白 (RBP77) 的数据集上进行了五重交叉验证方法的外部评估。它的总体准确率为 88.66%,马修斯相关系数 (MCC) 为 0.69。此外,还使用了 39 个 RNA 结合蛋白 (RBP39) 的独立数据集来进一步评估性能,总体准确率为 82.36%,MCC 为 0.44。结果表明,我们的方法在预测新蛋白质的 RNA 结合位点方面具有良好的泛化能力。与其他先前的方法相比,我们的方法在同一数据集上表现良好。预测结果表明,所使用的特征在预测蛋白质中的 RNA 结合位点方面是有效的。本文中使用的代码和所有数据集均可在 http://cic.scu.edu.cn/bioinformatics/Predict_RBP.rar 上免费获取。