Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
Max Delbruck Center for Molecular Medicine, -Buch, Berlin, Germany.
BMC Genomics. 2018 Feb 20;19(1):154. doi: 10.1186/s12864-018-4540-1.
RNA-binding proteins (RBPs) play vital roles in many processes in the cell. Different RBPs bind RNA with different sequence and structure specificities. While sequence specificities for a large set of 205 RBPs have been reported through the RNAcompete compendium, structure specificities are known for only a small fraction. The main limitation lies in the design of the RNAcompete technology, which tests RBP binding against unstructured RNA probes, making it difficult to infer structural preferences from these data. We recently developed RCK, an algorithm to infer sequence and structural binding models from RNAcompete data. The set of binding models enables, for the first time, a large-scale assessment of RNA structure in the RBPome.
We re-validate and uncover the role of RNA structure in the RPBome through novel analysis of the largest-scale dataset to date. First, we show that RNA structure exists in presumably unstructured RNA probes and that its variability is correlated with RNA-binding. Second, we examine the structural binding preferences of RBPs and discover an overall preference to bind RNA loops. Third, we significantly improve protein-binding prediction using RNA structure, both in vitro and in vivo. Lastly, we demonstrate that RNA structural binding preferences can be inferred for new proteins from solely their amino acid content.
By counter-intuitively demonstrating through our analysis that we can predict both the RNA structure of and RBP binding to these putatively unstructured RNAs, we transform a compendium of RNA-binding proteins into a valuable resource for structure-based binding models. We uncover the important role RNA structure plays in protein-RNA interaction for hundreds of RNA-binding proteins.
RNA 结合蛋白(RBPs)在细胞的许多过程中发挥着重要作用。不同的 RBPs 以不同的序列和结构特异性结合 RNA。虽然通过 RNAcompete 汇编报告了大量 205 个 RBP 的序列特异性,但仅知道一小部分的结构特异性。主要的限制在于 RNAcompete 技术的设计,它针对无结构的 RNA 探针测试 RBP 结合,因此很难从这些数据中推断出结构偏好。我们最近开发了 RCK,这是一种从 RNAcompete 数据推断序列和结构结合模型的算法。这些结合模型集首次能够大规模评估 RBP 组中的 RNA 结构。
我们通过对迄今为止最大规模数据集的新分析重新验证并揭示了 RNA 结构在 RBP 组中的作用。首先,我们表明 RNA 结构存在于推测无结构的 RNA 探针中,其可变性与 RNA 结合相关。其次,我们检查了 RBPs 的结构结合偏好,并发现了对 RNA 环的总体偏好。第三,我们通过 RNA 结构显著提高了体外和体内的蛋白质结合预测。最后,我们证明可以仅从氨基酸含量推断新蛋白质的 RNA 结构结合偏好。
通过我们的分析反直觉地表明,我们可以预测这些推测无结构的 RNA 的 RNA 结构和 RBP 结合,我们将 RNA 结合蛋白的汇编转化为基于结构的结合模型的有价值资源。我们揭示了 RNA 结构在数百种 RNA 结合蛋白的蛋白质-RNA 相互作用中所起的重要作用。