Suppr超能文献

仅使用序列信息预测 RNA-蛋白质相互作用。

Predicting RNA-protein interactions using only sequence information.

机构信息

Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, USA.

出版信息

BMC Bioinformatics. 2011 Dec 22;12:489. doi: 10.1186/1471-2105-12-489.

Abstract

BACKGROUND

RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.

RESULTS

We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens.

CONCLUSIONS

Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.

摘要

背景

RNA 与蛋白质的相互作用(RPIs)在广泛的细胞过程中发挥着重要作用,范围从基因表达的转录和转录后调控到宿主对病原体的防御。用于识别 RNA 与蛋白质相互作用的高通量实验开始提供有关 RNA 与蛋白质相互作用网络复杂性的有价值信息,但这些实验既昂贵又耗时。因此,需要可靠的计算方法来预测 RNA 与蛋白质的相互作用。

结果

我们提出了 RPISeq,这是一组仅使用序列信息预测 RNA 与蛋白质相互作用的分类器。给定 RNA 和蛋白质的序列作为输入,RPIseq 预测 RNA 与蛋白质对是否相互作用。RNA 序列编码为其核糖核苷酸 4 聚体组成的归一化向量,而蛋白质序列编码为其 3 聚体组成的归一化向量,基于 7 字母简化字母表示。提出了两种变体的 RPISeq:使用支持向量机(SVM)分类器的 RPISeq-SVM 和使用随机森林(RF)分类器的 RPISeq-RF。在从蛋白质 RNA 界面数据库(PRIDB)中提取的两个非冗余基准数据集上,RPISeq 的 AUC(接收者操作特征曲线下的面积)达到 0.96 和 0.92。在仅包含 mRNA 与蛋白质相互作用的第三个数据集上,RPISeq 的性能与需要 RNA 和蛋白质伙伴的许多不同特征(例如 mRNA 半衰期、GO 注释)信息的已发表方法相当。此外,使用 PRIDB 数据训练的 RPISeq 分类器正确预测了来自大肠杆菌、酿酒酵母、黑腹果蝇、小家鼠和智人的 NPInter 衍生网络中非编码 RNA 与蛋白质相互作用的大多数(57-99%)。

结论

我们对 RPISeq 的实验表明,仅使用基于序列的信息就可以可靠地预测 RNA 与蛋白质的相互作用。RPISeq 为计算构建 RNA 与蛋白质相互作用网络提供了一种廉价的方法,并且应该为非编码 RNA 的功能提供有用的见解。RPISeq 可作为基于网络的服务器在 http://pridb.gdcb.iastate.edu/RPISeq/ 上免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验