Suppr超能文献

利用 RNA 一级序列和二级结构的分布式表示来推断 RNA 结合蛋白结合位点的深度神经网络。

Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure.

机构信息

School of Computer Science and Engineering, Central South University, Changsha, 410075, China.

Aliyun School of Big Data, Changzhou University, Changzhou, 213164, China.

出版信息

BMC Genomics. 2020 Dec 17;21(Suppl 13):866. doi: 10.1186/s12864-020-07239-w.

Abstract

BACKGROUND

RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences.

RESULTS

In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets.

CONCLUSIONS

Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/ .

摘要

背景

RNA 结合蛋白 (RBPs) 在所有真核生物的转录后过程中发挥着至关重要的作用,例如剪接调控、mRNA 运输以及调节 mRNA 翻译和降解。确定 RBP 结合位点是理解转录后基因调控生物学机制的关键步骤。然而,由于生化测定的成本高昂,大规模确定 RBP 结合位点是一项具有挑战性的任务。相当多的研究利用机器学习方法来预测结合位点。特别是,深度学习凭借其从 DNA 和蛋白质序列中学习泛化表示的能力,在生物信息学领域得到了越来越多的应用。

结果

在本文中,我们实现了一种新的深度神经网络模型 DeepRKE,该模型结合了原始 RNA 序列和二级结构信息,有效地预测了 RBP 结合位点。具体来说,我们使用词嵌入算法提取 RNA 序列和二级结构的特征,即 k-mer 序列的分布式表示,而不是传统的独热编码。分布式表示作为卷积神经网络 (CNN) 和双向长短时记忆网络 (BiLSTM) 的输入,以识别 RBP 结合位点。我们的结果表明,在两个大型基准数据集上,DeepRKE 优于现有的对比方法。

结论

我们的广泛实验结果表明,DeepRKE 是一种预测 RBP 结合位点的有效工具。RNA 序列和二级结构的分布式表示可以有效地检测 k-mer 之间的潜在关系和相似性,从而提高预测性能。DeepRKE 的源代码可在 https://github.com/youzhiliu/DeepRKE/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a040/7745412/5c6503c402d0/12864_2020_7239_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验