Suppr超能文献

考虑结合伙伴预测与蛋白质结合的 RNA 核苷酸。

Predicting protein-binding RNA nucleotides with consideration of binding partners.

机构信息

Department of Computer Science and Engineering, Inha University, Incheon, South Korea.

Department of Computer Science and Engineering, Inha University, Incheon, South Korea.

出版信息

Comput Methods Programs Biomed. 2015 Jun;120(1):3-15. doi: 10.1016/j.cmpb.2015.03.010. Epub 2015 Apr 8.

Abstract

In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure.

摘要

近年来,已经开发出几种计算方法来预测蛋白质中的 RNA 结合位点。这些方法大多没有考虑蛋白质的相互作用伙伴,因此即使蛋白质与不同的 RNA 结合,它们也会预测出相同的 RNA 结合位点。与预测蛋白质中 RNA 结合位点的问题不同,预测 RNA 中蛋白质结合位点的问题主要受到关注,这主要是因为它更困难,平均准确性较低。在我们之前的研究中,我们开发了一种从 RNA 序列预测蛋白质结合核苷酸的方法。为了提高以前方法的预测准确性和实用性,我们开发了一种使用 RNA 和蛋白质序列数据的新方法。在这项研究中,我们确定了 RNA 和蛋白质分子的有效特征,并开发了一种新的支持向量机 (SVM) 模型,用于从 RNA 和蛋白质序列数据中预测蛋白质结合核苷酸。在 10 倍交叉验证中,使用蛋白质和 RNA 序列数据的新模型的灵敏度为 86.5%,特异性为 86.2%,阳性预测值 (PPV) 为 72.6%,阴性预测值 (NPV) 为 93.8%,马修斯相关系数 (MCC) 为 0.69;在独立测试中,它的灵敏度为 58.8%,特异性为 87.4%,PPV 为 65.1%,NPV 为 84.2%,MCC 为 0.48。为了进行比较,我们构建了另一个仅使用 RNA 序列数据的预测模型,并在同一数据集上运行该模型。在 10 倍交叉验证中,它的灵敏度为 85.7%,特异性为 80.5%,PPV 为 67.7%,NPV 为 92.2%,MCC 为 0.63;在独立测试中,它的灵敏度为 67.7%,特异性为 78.8%,PPV 为 57.6%,NPV 为 85.2%,MCC 为 0.45。在交叉验证和独立测试中,与仅使用 RNA 序列数据的模型相比,使用 RNA 和蛋白质序列的新模型在大多数性能指标上都表现出更好的性能。据我们所知,这是第一个考虑 RNA 结合伙伴的基于序列的 RNA 中蛋白质结合核苷酸的预测。该新模型将为设计生化实验提供有价值的信息,以找到具有未知结构的 RNA 中假定的蛋白质结合位点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验