Zhang Sai, Zhou Jingtian, Hu Hailin, Gong Haipeng, Chen Ligong, Cheng Chao, Zeng Jianyang
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, Beijing 100084, China.
Nucleic Acids Res. 2016 Feb 29;44(4):e32. doi: 10.1093/nar/gkv1025. Epub 2015 Oct 13.
RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp.
RNA结合蛋白(RBPs)在RNA的转录后调控中发挥着重要作用。识别RBP结合位点并表征RBP结合偏好是理解转录后基因调控基本机制的关键步骤。尽管已经开发了许多计算方法来模拟RBP结合偏好,但通过整合其在所有三个维度上的可用结构特征来发现RBP靶标的完整结构表示仍然是一项具有挑战性的任务。在本文中,我们开发了一个通用且灵活的深度学习框架,用于模拟结构结合偏好并预测RBPs的结合位点,该框架首次考虑了(预测的)RNA三级结构信息。我们的框架构建了一个统一的表示,该表示在所有三个维度上表征了RBP靶标的结构特异性,可进一步用于预测新的候选结合位点并发现潜在的结合基序。通过对真实的CLIP-seq数据集进行测试,我们证明了我们的深度学习框架可以自动从编码的原始序列和结构概况中提取有效的隐藏结构特征,并预测准确的RBP结合位点。此外,我们进行了首次研究,表明整合额外的RNA三级结构特征可以提高预测RBP结合位点的模型性能,特别是对于多嘧啶序列结合蛋白(PTB),这也为支持RBPs可能具有特定三级结构结合偏好的观点提供了新的证据。特别是,对内部核糖体进入位点(IRES)片段的测试在文献的实验支持下产生了令人满意的结果,并进一步证明了将RNA三级结构信息纳入预测模型的必要性。我们方法的源代码可在https://github.com/thucombio/deepnet-rbp上找到。