RCK：基于RNAcompete数据准确高效地推断基于序列和结构的蛋白质-RNA结合模型。

RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

作者信息

Orenstein Yaron, Wang Yuhao, Berger Bonnie

机构信息

Computer Science and Artificial Intelligence Laboratory.

Computer Science and Artificial Intelligence Laboratory Math Department, MIT, Cambridge, MA, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i351-i359. doi: 10.1093/bioinformatics/btw259.

DOI:10.1093/bioinformatics/btw259

PMID:27307637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908343/

Abstract

MOTIVATION

Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset.

RESULTS

We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale.

AVAILABILITY AND IMPLEMENTATION

Software and models are freely available at http://rck.csail.mit.edu/

CONTACT

bab@mit.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质与RNA的相互作用在许多过程中起着至关重要的作用，这种相互作用是通过RNA序列和结构介导的。基于CLIP的方法用于在体内测量蛋白质与RNA的结合，但存在实验噪声和系统偏差，而体外实验能捕捉到更清晰的蛋白质与RNA结合信号。其中，RNAcompete可在一次实验中提供特定蛋白质与超过240000个无结构RNA探针的结合亲和力。计算方面的挑战是从这些数据中推断基于RNA结构和序列的结合模型。序列模型中的先进方法Deepbind没有对结构偏好进行建模。RNAcontext对序列和结构偏好都进行了建模，但性能不如GraphProt。不幸的是，正如其开发者所指出的，由于数据的无结构性质，GraphProt无法从RNAcompete数据中检测结构偏好，也无法在完整的RNACompete数据集上进行有效运行。

结果

我们开发了RCK，这是一种高效、可扩展的算法，它基于一种新的基于k-mer的模型推断序列和结构偏好。值得注意的是，尽管RNAcompete数据设计为无结构的，但RCK仍能从中学习结构偏好。在针对244个RNAcompete实验的体外结合预测中，RCK显著优于RNAcontext和Deepbind。此外，RCK速度更快且内存使用更少，具有可扩展性。虽然目前在小规模测试的体内结合预测方面与现有方法相当，但我们证明，与通过计算预测的RNA结构概况相比，RCK将越来越受益于实验测量的RNA结构概况。通过在整个RNAcompete数据集上运行RCK，我们以前所未有的规模生成并提供了一组基于蛋白质-RNA结构的模型作为资源。

可用性和实现方式

软件和模型可在http://rck.csail.mit.edu/免费获取。

联系方式

bab@mit.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

Bioinformatics. 2016 Jun 15;32(12):i351-i359. doi: 10.1093/bioinformatics/btw259.

Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction.

PLoS Comput Biol. 2019 Sep 4;15(9):e1007283. doi: 10.1371/journal.pcbi.1007283. eCollection 2019 Sep.

Finding RNA structure in the unstructured RBPome.

BMC Genomics. 2018 Feb 20;19(1):154. doi: 10.1186/s12864-018-4540-1.

A comparative analysis of RNA-binding proteins binding models learned from RNAcompete, RNA Bind-n-Seq and eCLIP data.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab149.

RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection.

Methods. 2017 Aug 15;126:18-28. doi: 10.1016/j.ymeth.2017.06.024. Epub 2017 Jun 24.

Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins.

Nat Biotechnol. 2009 Jul;27(7):667-70. doi: 10.1038/nbt.1550. Epub 2009 Jun 28.

RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

PLoS Comput Biol. 2010 Jul 1;6(7):e1000832. doi: 10.1371/journal.pcbi.1000832.

RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins.

Methods. 2017 Apr 15;118-119:3-15. doi: 10.1016/j.ymeth.2016.12.003. Epub 2016 Dec 10.

GraphProt: modeling binding preferences of RNA-binding proteins.

Genome Biol. 2014 Jan 22;15(1):R17. doi: 10.1186/gb-2014-15-1-r17.

A deep neural network approach for learning intrinsic protein-RNA binding preferences.

Bioinformatics. 2018 Sep 1;34(17):i638-i646. doi: 10.1093/bioinformatics/bty600.

引用本文的文献

EPIFBMC: A New Model for Enhancer-Promoter Interaction Prediction.

Int J Mol Sci. 2025 Aug 20;26(16):8035. doi: 10.3390/ijms26168035.

A resource of RNA-binding protein motifs across eukaryotes reveals evolutionary dynamics and gene-regulatory function.

Nat Biotechnol. 2025 Jul 25. doi: 10.1038/s41587-025-02733-6.

rbpTransformer: A novel deep learning model for prediction of piRNA and mRNA bindings.

PLoS One. 2025 Jun 25;20(6):e0324462. doi: 10.1371/journal.pone.0324462. eCollection 2025.

iCRBP-LKHA: Large convolutional kernel and hybrid channel-spatial attention for identifying circRNA-RBP interaction sites.

PLoS Comput Biol. 2024 Aug 22;20(8):e1012399. doi: 10.1371/journal.pcbi.1012399. eCollection 2024 Aug.

DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures.

Comput Struct Biotechnol J. 2023 Dec 30;23:617-625. doi: 10.1016/j.csbj.2023.12.040. eCollection 2024 Dec.

CircSI-SSL: circRNA-binding site identification based on self-supervised learning.

Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btae004.

KDeep: a new memory-efficient data extraction method for accurately predicting DNA/RNA transcription factor binding sites.

J Transl Med. 2023 Oct 16;21(1):727. doi: 10.1186/s12967-023-04593-7.

CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization.

BMC Bioinformatics. 2023 May 30;24(1):220. doi: 10.1186/s12859-023-05352-7.

PrismNet: predicting protein-RNA interaction using in vivo RNA structural information.

Nucleic Acids Res. 2023 Jul 5;51(W1):W468-W477. doi: 10.1093/nar/gkad353.

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks.

Methods Mol Biol. 2023;2586:197-215. doi: 10.1007/978-1-0716-2768-6_12.

本文引用的文献

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

Genome-wide analysis of YB-1-RNA interactions reveals a novel role of YB-1 in miRNA processing in glioblastoma multiforme.

Nucleic Acids Res. 2015 Sep 30;43(17):8516-28. doi: 10.1093/nar/gkv779. Epub 2015 Aug 3.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

The MEME Suite.

Nucleic Acids Res. 2015 Jul 1;43(W1):W39-49. doi: 10.1093/nar/gkv416. Epub 2015 May 7.

Structural imprints in vivo decode RNA regulatory mechanisms.

Nature. 2015 Mar 26;519(7544):486-90. doi: 10.1038/nature14263. Epub 2015 Mar 18.

A census of human RNA-binding proteins.

Nat Rev Genet. 2014 Dec;15(12):829-45. doi: 10.1038/nrg3813. Epub 2014 Nov 4.

The RNA shapes studio.

Bioinformatics. 2015 Feb 1;31(3):423-5. doi: 10.1093/bioinformatics/btu649. Epub 2014 Oct 1.

Context-dependent control of alternative splicing by RNA-binding proteins.

Nat Rev Genet. 2014 Oct;15(10):689-701. doi: 10.1038/nrg3778. Epub 2014 Aug 12.

RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins.

Mol Cell. 2014 Jun 5;54(5):887-900. doi: 10.1016/j.molcel.2014.04.016. Epub 2014 May 15.

'Oming in on RNA-protein interactions.

Genome Biol. 2014 Jan 31;15(1):401. doi: 10.1186/gb4158.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

RCK：基于RNAcompete数据准确高效地推断基于序列和结构的蛋白质-RNA结合模型。

RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献