• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用傅里叶变换架构和机器翻译任务改进蛋白质编码潜力的深度学习模型。

Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task.

机构信息

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America.

Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America.

出版信息

PLoS Comput Biol. 2023 Oct 12;19(10):e1011526. doi: 10.1371/journal.pcbi.1011526. eCollection 2023 Oct.

DOI:10.1371/journal.pcbi.1011526
PMID:37824580
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10597526/
Abstract

Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.

摘要

核糖体是信息处理的高分子机器,它将信使 RNA(mRNA)转录本中的复杂序列模式整合起来合成蛋白质。对区分 mRNA 和长非编码 RNA(lncRNA)的序列特征的研究,可能有助于深入了解指导和调节翻译的信息。用于计算蛋白质编码潜力的计算方法对于在基因组注释过程中区分 mRNA 和 lncRNA 非常重要,但该任务的大多数机器学习方法都依赖于先前已知的规则来定义特征。序列到序列(seq2seq)模型,特别是使用转换器网络的模型,已被证明能够学习单词之间复杂的语法关系,从而执行自然语言翻译。为了利用生物学领域的这些进展,我们提出了一种使用深度神经网络预测蛋白质编码潜力的 seq2seq 公式,并证明与仅分类训练目标相比,同时从 RNA 到蛋白质学习翻译可以提高分类性能。受基因发现的经典信号处理方法和基于傅里叶的图像处理神经网络的启发,我们引入了 LocalFilterNet(LFNet)。LFNet 是一种具有建模编码序列中三核苷酸周期性的归纳偏差的网络架构。我们将 LFNet 纳入编码器-解码器框架中,以测试翻译任务是否可以改善转录本的分类和对其序列特征的解释。我们使用所得模型计算核苷酸分辨率的重要性得分,揭示有助于细胞机制区分 mRNA 和 lncRNA 的序列模式。最后,我们开发了一种从集成梯度(一种基于反向传播的特征归因)估计突变效应的新方法,并在这种情况下对有效逼近的难度进行了特征化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/7deec35b226d/pcbi.1011526.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/bf2ac6f13422/pcbi.1011526.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/69042bf0a961/pcbi.1011526.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/630d091b8318/pcbi.1011526.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/816e6b90aac8/pcbi.1011526.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/0be6dda75e2b/pcbi.1011526.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/7deec35b226d/pcbi.1011526.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/bf2ac6f13422/pcbi.1011526.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/69042bf0a961/pcbi.1011526.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/630d091b8318/pcbi.1011526.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/816e6b90aac8/pcbi.1011526.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/0be6dda75e2b/pcbi.1011526.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d0/10597526/7deec35b226d/pcbi.1011526.g006.jpg

相似文献

1
Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task.利用傅里叶变换架构和机器翻译任务改进蛋白质编码潜力的深度学习模型。
PLoS Comput Biol. 2023 Oct 12;19(10):e1011526. doi: 10.1371/journal.pcbi.1011526. eCollection 2023 Oct.
2
Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task.利用傅里叶变换架构和机器翻译任务改进蛋白质编码潜力的深度模型。
bioRxiv. 2023 Apr 19:2023.04.03.535488. doi: 10.1101/2023.04.03.535488.
3
A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.深度递归神经网络发现复杂的生物学规则,以破译 RNA 蛋白编码潜力。
Nucleic Acids Res. 2018 Sep 19;46(16):8105-8113. doi: 10.1093/nar/gky567.
4
Machine Learning-Based Annotation of Long Noncoding RNAs Using PLncPRO.基于机器学习的 PLncPRO 长非编码 RNA 注释
Methods Mol Biol. 2020;2107:253-260. doi: 10.1007/978-1-0716-0235-5_12.
5
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.
6
PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.PLEK:一种基于改进的k-mer方案预测长链非编码RNA和信使RNA的工具。
BMC Bioinformatics. 2014 Sep 19;15(1):311. doi: 10.1186/1471-2105-15-311.
7
DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction.DeepCPP:一种基于核苷酸偏差信息和最小分布相似性特征选择的深度神经网络,用于 RNA 编码潜力预测。
Brief Bioinform. 2021 Mar 22;22(2):2073-2084. doi: 10.1093/bib/bbaa039.
8
Prediction of Long Non-Coding RNAs Based on Deep Learning.基于深度学习的长非编码 RNA 预测。
Genes (Basel). 2019 Apr 3;10(4):273. doi: 10.3390/genes10040273.
9
Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features.基于RNA序列特征的蛋白质编码和长链非编码转录本的比较分析。
J Bioinform Comput Biol. 2018 Apr;16(2):1840013. doi: 10.1142/S0219720018400139.
10
A deep learning method for lincRNA detection using auto-encoder algorithm.一种使用自动编码器算法进行长链非编码RNA(lincRNA)检测的深度学习方法。
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):511. doi: 10.1186/s12859-017-1922-3.

引用本文的文献

1
Dysregulation of alternative splicing patterns in the ovaries of reproductively aged mice.生殖衰老小鼠卵巢中可变剪接模式的失调。
bioRxiv. 2025 May 23:2025.05.19.654918. doi: 10.1101/2025.05.19.654918.

本文引用的文献

1
The genetic and biochemical determinants of mRNA degradation rates in mammals.哺乳动物中 mRNA 降解速率的遗传和生化决定因素。
Genome Biol. 2022 Nov 23;23(1):245. doi: 10.1186/s13059-022-02811-x.
2
Translation and natural selection of micropeptides from long non-canonical RNAs.长非编码 RNA 来源的小肽的翻译和自然选择。
Nat Commun. 2022 Oct 31;13(1):6515. doi: 10.1038/s41467-022-34094-y.
3
Obtaining genetics insights from deep learning via explainable artificial intelligence.通过可解释人工智能从深度学习中获取遗传学见解。
Nat Rev Genet. 2023 Feb;24(2):125-137. doi: 10.1038/s41576-022-00532-2. Epub 2022 Oct 3.
4
Accelerating in silico saturation mutagenesis using compressed sensing.利用压缩感知加速计算机模拟饱和突变。
Bioinformatics. 2022 Jul 11;38(14):3557-3564. doi: 10.1093/bioinformatics/btac385.
5
Predicting RNA splicing from DNA sequence using Pangolin.使用 Pangolin 从 DNA 序列预测 RNA 剪接。
Genome Biol. 2022 Apr 21;23(1):103. doi: 10.1186/s13059-022-02664-4.
6
MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect.MAVE-NN:从变异效应的多重分析中学习基因型-表型图谱。
Genome Biol. 2022 Apr 15;23(1):98. doi: 10.1186/s13059-022-02661-7.
7
LncPep: A Resource of Translational Evidences for lncRNAs.LncPep:lncRNAs的翻译证据资源。
Front Cell Dev Biol. 2022 Jan 24;10:795084. doi: 10.3389/fcell.2022.795084. eCollection 2022.
8
Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用,从序列中有效预测基因表达。
Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.
9
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
10
Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.全局重要性分析:一种用于量化深度神经网络中基因组特征重要性的可解释性方法。
PLoS Comput Biol. 2021 May 13;17(5):e1008925. doi: 10.1371/journal.pcbi.1008925. eCollection 2021 May.