Laboratory of Molecular Neuro-Oncology, Howard Hughes Medical Institute, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
Nucleic Acids Res. 2013 Aug;41(14):6793-807. doi: 10.1093/nar/gkt421. Epub 2013 May 18.
Sequence-specific interactions of RNA-binding proteins (RBPs) with their target transcripts are essential for post-transcriptional gene expression regulation in mammals. However, accurate prediction of RBP motif sites has been difficult because many RBPs recognize short and degenerate sequences. Here we describe a hidden Markov model (HMM)-based algorithm mCarts to predict clustered functional RBP-binding sites by effectively integrating the number and spacing of individual motif sites, their accessibility in local RNA secondary structures and cross-species conservation. This algorithm learns and quantifies rules of these features, taking advantage of a large number of in vivo RBP-binding sites obtained from cross-linking and immunoprecipitation data. We applied this algorithm to study two representative RBP families, Nova and Mbnl, which regulate tissue-specific alternative splicing through interacting with clustered YCAY and YGCY elements, respectively, and predicted their binding sites in the mouse transcriptome. Despite the low information content in individual motif elements, our algorithm made specific predictions for successful experimental validation. Analysis of predicted sites also revealed cases of extensive and distal RBP-binding sites important for splicing regulation. This algorithm can be readily applied to other RBPs to infer their RNA-regulatory networks. The software is freely available at http://zhanglab.c2b2.columbia.edu/index.php/MCarts.
RNA 结合蛋白 (RBPs) 与它们的靶转录本的序列特异性相互作用对于哺乳动物的转录后基因表达调控至关重要。然而,由于许多 RBP 识别短而简并的序列,因此准确预测 RBP 基序位点一直很困难。在这里,我们描述了一种基于隐马尔可夫模型 (HMM) 的算法 mCarts,通过有效地整合单个基序位点的数量和间隔、它们在局部 RNA 二级结构中的可及性以及跨物种保守性,来预测聚类功能 RBP 结合位点。该算法利用大量从交联和免疫沉淀数据中获得的体内 RBP 结合位点,学习和量化这些特征的规则。我们将该算法应用于研究两个有代表性的 RBP 家族,Nova 和 Mbnl,它们分别通过与聚类的 YCAY 和 YGCY 元件相互作用来调节组织特异性的选择性剪接,并预测了它们在小鼠转录组中的结合位点。尽管单个基序元件的信息量较低,但我们的算法为成功的实验验证做出了具体预测。对预测位点的分析还揭示了广泛而远端的 RBP 结合位点对剪接调控很重要。该算法可以很容易地应用于其他 RBP,以推断它们的 RNA 调节网络。该软件可在 http://zhanglab.c2b2.columbia.edu/index.php/MCarts 免费获得。