深度递归神经网络发现复杂的生物学规则，以破译 RNA 蛋白编码潜力。

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

机构信息

School of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA.

Department of Biochemistry and Biophysics, Oregon State University, 2011 Ag & Life Sciences Bldg, Corvallis, OR 97331, USA.

出版信息

Nucleic Acids Res. 2018 Sep 19;46(16):8105-8113. doi: 10.1093/nar/gky567.

DOI:10.1093/nar/gky567

PMID:29986088

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6144860/

Abstract

The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data.

摘要

当前新鉴定的 RNA 转录本数量众多，这为提高编码潜力评估（基因组注释的基石）提供了一个独特的机会，也为机器驱动的生物知识发现提供了机会。虽然基于特征的传统 RNA 分类方法受到当前科学知识的限制，但深度学习方法可以独立地在数据中发现复杂的生物学规则。我们在人类信使 RNA(mRNA)和长非编码 RNA(lncRNA)序列上训练了一个门控递归神经网络(RNN)。尽管我们的模型 mRNN（mRNA RNN）是在使用更少的数据和没有关于哪些特征定义 mRNA 的先验概念的情况下进行训练的，但它在预测蛋白质编码潜力方面超过了最先进的方法。为了了解 mRNN 学到了什么，我们探测了网络，并发现了几个上下文敏感的密码子，它们对编码潜力具有高度预测性。我们的结果表明，门控 RNN 可以学习全长人类转录本中的复杂和长程模式，这使它们非常适合执行广泛的困难分类任务，最重要的是，从不断涌现的测序数据中获取新的生物学见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1740/6144860/760e21dad956/gky567fig1.jpg

相似文献

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

Nucleic Acids Res. 2018 Sep 19;46(16):8105-8113. doi: 10.1093/nar/gky567.

Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task.

PLoS Comput Biol. 2023 Oct 12;19(10):e1011526. doi: 10.1371/journal.pcbi.1011526. eCollection 2023 Oct.

A deep learning method for lincRNA detection using auto-encoder algorithm.

BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):511. doi: 10.1186/s12859-017-1922-3.

DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction.

Brief Bioinform. 2021 Mar 22;22(2):2073-2084. doi: 10.1093/bib/bbaa039.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

BMC Bioinformatics. 2014 Sep 19;15(1):311. doi: 10.1186/1471-2105-15-311.

Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.

BMC Cancer. 2019 Dec 3;19(1):1176. doi: 10.1186/s12885-019-6338-1.

A Methodology to Study Pseudogenized lincRNAs.

Methods Mol Biol. 2021;2324:49-63. doi: 10.1007/978-1-0716-1503-4_4.

Feature extraction approaches for biological sequences: a comparative study of mathematical features.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab011.

A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.

BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.

BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

Nucleic Acids Res. 2018 Sep 19;46(16):e96. doi: 10.1093/nar/gky462.

引用本文的文献

Longitudinal host-microbiome dynamics of metatranscription identify hallmarks of progression in periodontitis.

Microbiome. 2025 May 14;13(1):119. doi: 10.1186/s40168-025-02108-8.

Machine learning approaches enable the discovery of therapeutics across domains.

Mol Ther. 2025 May 7;33(5):2269-2278. doi: 10.1016/j.ymthe.2025.04.001. Epub 2025 Apr 3.

Establishing a GRU-GCN coordination-based prediction model for miRNA-disease associations.

BMC Genom Data. 2025 Jan 14;26(1):4. doi: 10.1186/s12863-024-01293-z.

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

Adv Sci (Weinh). 2024 Oct;11(39):e2407013. doi: 10.1002/advs.202407013. Epub 2024 Aug 19.

Current understanding of functional peptides encoded by lncRNA in cancer.

Cancer Cell Int. 2024 Jul 19;24(1):252. doi: 10.1186/s12935-024-03446-7.

Big data and deep learning for RNA biology.

Exp Mol Med. 2024 Jun;56(6):1293-1321. doi: 10.1038/s12276-024-01243-w. Epub 2024 Jun 14.

ntEmbd: Deep learning embedding for nucleotide sequences.

bioRxiv. 2024 May 2:2024.04.30.591806. doi: 10.1101/2024.04.30.591806.

Evaluating generalizability of artificial intelligence models for molecular datasets.

bioRxiv. 2024 Feb 28:2024.02.25.581982. doi: 10.1101/2024.02.25.581982.

A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder.

Nucleic Acids Res. 2023 Nov 27;51(21):e110. doi: 10.1093/nar/gkad929.

Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task.

PLoS Comput Biol. 2023 Oct 12;19(10):e1011526. doi: 10.1371/journal.pcbi.1011526. eCollection 2023 Oct.

本文引用的文献

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:178-183. doi: 10.1109/bibm.2016.7822515. Epub 2017 Jan 19.

Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths.

Bioinformatics. 2018 Mar 1;34(5):828-835. doi: 10.1093/bioinformatics/btx659.

A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.

BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.

TITER: predicting translation initiation sites by deep learning.

Bioinformatics. 2017 Jul 15;33(14):i234-i242. doi: 10.1093/bioinformatics/btx247.

DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning.

Genome Biol. 2017 Apr 11;18(1):67. doi: 10.1186/s13059-017-1189-z.

FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

Nucleic Acids Res. 2017 May 5;45(8):e57. doi: 10.1093/nar/gkw1306.

Synergistic and compensatory effects of two point mutations conferring target-site resistance to fipronil in the insect GABA receptor RDL.

Sci Rep. 2016 Aug 25;6:32335. doi: 10.1038/srep32335.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

Predicting effects of noncoding variants with deep learning-based sequence model.

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深度递归神经网络发现复杂的生物学规则，以破译 RNA 蛋白编码潜力。

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

机构信息

School of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA.

Department of Biochemistry and Biophysics, Oregon State University, 2011 Ag & Life Sciences Bldg, Corvallis, OR 97331, USA.

出版信息

Nucleic Acids Res. 2018 Sep 19;46(16):8105-8113. doi: 10.1093/nar/gky567.

DOI:10.1093/nar/gky567

PMID:29986088

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6144860/

Abstract

摘要

深度递归神经网络发现复杂的生物学规则，以破译 RNA 蛋白编码潜力。

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

深度递归神经网络发现复杂的生物学规则，以破译 RNA 蛋白编码潜力。

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献