School of Informatics, Indiana University Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Nucleic Acids Res. 2011 Apr;39(8):3017-25. doi: 10.1093/nar/gkq1266. Epub 2010 Dec 22.
Mechanistic understanding of many key cellular processes often involves identification of RNA binding proteins (RBPs) and RNA binding sites in two separate steps. Here, they are predicted simultaneously by structural alignment to known protein-RNA complex structures followed by binding assessment with a DFIRE-based statistical energy function. This method achieves 98% accuracy and 91% precision for predicting RBPs and 93% accuracy and 78% precision for predicting RNA-binding amino-acid residues for a large benchmark of 212 RNA binding and 6761 non-RNA binding domains (leave-one-out cross-validation). Additional tests revealed that the method makes no false positive prediction from 311 DNA binding domains but correctly detects six domains binding with both DNA and RNA. In addition, it correctly identified 31 of 75 unbound RNA-binding domains with 92% accuracy and 65% precision for predicted binding residues and achieved 86% success rate in its application to SCOP RNA binding domain superfamily (Structural Classification Of Proteins). It further predicts 25 targets as RBPs in 2076 structural genomics targets: 20 of 25 predicted ones (80%) are putatively RNA binding. The superior performance over existing methods indicates the importance of dividing structures into domains, using a Z-score to measure relative structural similarity, and a statistical energy function to measure protein-RNA binding affinity.
在许多关键细胞过程的机制理解中,通常需要分别鉴定 RNA 结合蛋白 (RBP) 和 RNA 结合位点。在这里,通过与已知蛋白-RNA 复合物结构的结构比对,同时预测这两者,然后使用基于 DFIRE 的统计能量函数进行结合评估。该方法在大型 212 个 RNA 结合和 6761 个非 RNA 结合结构域(留一交叉验证)基准上,预测 RBP 的准确率达到 98%,精度达到 91%,预测 RNA 结合氨基酸残基的准确率达到 93%,精度达到 78%。其他测试表明,该方法不会从 311 个 DNA 结合结构域中产生假阳性预测,但可以正确检测到六个同时结合 DNA 和 RNA 的结构域。此外,它正确识别了 75 个未结合的 RNA 结合结构域中的 31 个,预测结合残基的准确率为 92%,精度为 65%,在 SCOP RNA 结合结构域超家族(蛋白质结构分类)中的应用成功率达到 86%。它还预测了 2076 个结构基因组学目标中的 25 个靶标为 RBP:25 个预测靶标中的 20 个(80%)被推测为 RNA 结合。与现有方法相比,该方法的优越性能表明将结构划分为结构域、使用 Z 分数来衡量相对结构相似性以及使用统计能量函数来衡量蛋白质-RNA 结合亲和力的重要性。