Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India.
PLoS One. 2014 Mar 21;9(3):e91140. doi: 10.1371/journal.pone.0091140. eCollection 2014.
Protein-RNA complexes play key roles in several cellular processes by the interactions of amino acids with RNA. To understand the recognition mechanism, it is important to identify the specific amino acids involved in RNA binding. Various computational methods have been developed for predicting RNA binding residues from protein sequence. However, their performances mainly depend on the training dataset, feature selection for developing a model and learning capacity of the model. Hence, it is important to reveal the correspondence between the performance of methods and properties of RNA-binding proteins (RBPs). In this work, we have collected all available RNA binding residues prediction methods and revealed their performances on unbiased, stringent and diverse datasets for RBPs with less than 25% sequence identity based on structural class, fold, superfamily, family, protein function, RNA type, RNA strand and RNA conformation. The best methods for each type of RBPs and the type of RBPs, which require further refinement in prediction, have been brought out. We also analyzed the performance of these methods for the disordered regions, structures which are not included in the training dataset and recently solved structures. The reliability of prediction is better than randomly choosing any method or combination of methods. This approach would be a valuable resource for biologists to choose the best method based on the type of RBPs for designing their experiments and the tool is freely accessible online at www.iitm.ac.in/bioinfo/RNA-protein/.
蛋白质-RNA 复合物通过氨基酸与 RNA 的相互作用在几个细胞过程中发挥关键作用。为了了解识别机制,确定参与 RNA 结合的特定氨基酸是很重要的。已经开发了各种计算方法来从蛋白质序列预测 RNA 结合残基。然而,它们的性能主要取决于训练数据集、用于开发模型的特征选择和模型的学习能力。因此,揭示方法的性能与 RNA 结合蛋白(RBP)的性质之间的对应关系是很重要的。在这项工作中,我们收集了所有可用的 RNA 结合残基预测方法,并根据结构类、折叠、超家族、家族、蛋白质功能、RNA 类型、RNA 链和 RNA 构象,揭示了它们在无偏、严格和多样化的数据集上对序列同一性小于 25%的 RBP 的性能。对于每种类型的 RBP 和需要进一步改进预测的 RBP 类型,我们都找到了最佳的方法。我们还分析了这些方法在无序区域、未包含在训练数据集中的结构和最近解决的结构上的性能。预测的可靠性优于随机选择任何方法或方法组合。这种方法将为生物学家提供有价值的资源,根据 RBP 的类型选择最佳方法,以设计他们的实验,该工具可在 www.iitm.ac.in/bioinfo/RNA-protein/ 上免费在线获得。