Suppr超能文献

RCK:基于RNAcompete数据准确高效地推断基于序列和结构的蛋白质-RNA结合模型。

RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

作者信息

Orenstein Yaron, Wang Yuhao, Berger Bonnie

机构信息

Computer Science and Artificial Intelligence Laboratory.

Computer Science and Artificial Intelligence Laboratory Math Department, MIT, Cambridge, MA, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i351-i359. doi: 10.1093/bioinformatics/btw259.

Abstract

MOTIVATION

Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset.

RESULTS

We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale.

AVAILABILITY AND IMPLEMENTATION

Software and models are freely available at http://rck.csail.mit.edu/

CONTACT

bab@mit.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质与RNA的相互作用在许多过程中起着至关重要的作用,这种相互作用是通过RNA序列和结构介导的。基于CLIP的方法用于在体内测量蛋白质与RNA的结合,但存在实验噪声和系统偏差,而体外实验能捕捉到更清晰的蛋白质与RNA结合信号。其中,RNAcompete可在一次实验中提供特定蛋白质与超过240000个无结构RNA探针的结合亲和力。计算方面的挑战是从这些数据中推断基于RNA结构和序列的结合模型。序列模型中的先进方法Deepbind没有对结构偏好进行建模。RNAcontext对序列和结构偏好都进行了建模,但性能不如GraphProt。不幸的是,正如其开发者所指出的,由于数据的无结构性质,GraphProt无法从RNAcompete数据中检测结构偏好,也无法在完整的RNACompete数据集上进行有效运行。

结果

我们开发了RCK,这是一种高效、可扩展的算法,它基于一种新的基于k-mer的模型推断序列和结构偏好。值得注意的是,尽管RNAcompete数据设计为无结构的,但RCK仍能从中学习结构偏好。在针对244个RNAcompete实验的体外结合预测中,RCK显著优于RNAcontext和Deepbind。此外,RCK速度更快且内存使用更少,具有可扩展性。虽然目前在小规模测试的体内结合预测方面与现有方法相当,但我们证明,与通过计算预测的RNA结构概况相比,RCK将越来越受益于实验测量的RNA结构概况。通过在整个RNAcompete数据集上运行RCK,我们以前所未有的规模生成并提供了一组基于蛋白质-RNA结构的模型作为资源。

可用性和实现方式

软件和模型可在http://rck.csail.mit.edu/免费获取。

联系方式

bab@mit.edu

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验