Suppr超能文献

RNAsamba:基于神经网络的RNA序列蛋白质编码潜力评估

RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences.

作者信息

Camargo Antonio P, Sourkov Vsevolod, Pereira Gonçalo A G, Carazzolle Marcelo F

机构信息

Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas, Campinas, SP, 13083-862, Brazil.

Department of Computer Science, ReDNA Labs, Pattaya, Chonburi, 20150, Thailand.

出版信息

NAR Genom Bioinform. 2020 Jan 13;2(1):lqz024. doi: 10.1093/nargab/lqz024. eCollection 2020 Mar.

Abstract

The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba's classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.

摘要

高通量测序技术的出现使得快速且低成本地获取大量遗传信息成为可能。因此,许多研究致力于揭示基因组元件的生物学作用,区分蛋白质编码RNA和长链非编码RNA是其中最重要的任务之一。我们介绍了RNAsamba,这是一种利用基于神经网络的方法从序列信息预测RNA分子编码潜力的工具,该方法对整个序列和开放阅读框(ORF)进行建模,以识别区分编码转录本和非编码转录本的模式。我们使用来自人类和其他几种模式生物的转录本评估了RNAsamba的分类性能,结果表明它反复优于其他先进方法。我们的结果还表明,RNAsamba可以在部分长度的ORF和非翻译区(UTR)序列中识别编码信号,证明其算法不依赖于完整的转录本序列。此外,RNAsamba还可以预测传统上通过核糖体谱实验鉴定的小ORF。我们相信,RNAsamba将使人们能够从首次测序物种的基因组数据中更快、更准确地获得生物学发现。可在https://rnasamba.lge.ibi.unicamp.br/找到用户友好的网页界面、包含本地安装和使用说明的文档以及RNAsamba的源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ffa/7671399/d73dea6782ad/lqz024fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验