• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepCPP:一种基于核苷酸偏差信息和最小分布相似性特征选择的深度神经网络,用于 RNA 编码潜力预测。

DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction.

机构信息

School of Computer Science and Engineering, Nanyang Techonological University, 50 Nanyang Avenue, Singapore.

School of Mathematical Sciences, Dalian University of Technology, No.2 Linggong Road, Dalian, China.

出版信息

Brief Bioinform. 2021 Mar 22;22(2):2073-2084. doi: 10.1093/bib/bbaa039.

DOI:10.1093/bib/bbaa039
PMID:32227075
Abstract

The development of deep sequencing technologies has led to the discovery of novel transcripts. Many in silico methods have been developed to assess the coding potential of these transcripts to further investigate their functions. Existing methods perform well on distinguishing majority long noncoding RNAs (lncRNAs) and coding RNAs (mRNAs) but poorly on RNAs with small open reading frames (sORFs). Here, we present DeepCPP (deep neural network for coding potential prediction), a deep learning method for RNA coding potential prediction. Extensive evaluations on four previous datasets and six new datasets constructed in different species show that DeepCPP outperforms other state-of-the-art methods, especially on sORF type data, which overcomes the bottleneck of sORF mRNA identification by improving more than 4.31, 37.24 and 5.89% on its accuracy for newly discovered human, vertebrate and insect data, respectively. Additionally, we also revealed that discontinuous k-mer, and our newly proposed nucleotide bias and minimal distribution similarity feature selection method play crucial roles in this classification problem. Taken together, DeepCPP is an effective method for RNA coding potential prediction.

摘要

深度测序技术的发展导致了新型转录本的发现。已经开发了许多计算方法来评估这些转录本的编码潜力,以进一步研究它们的功能。现有的方法在区分大多数长非编码 RNA(lncRNA)和编码 RNA(mRNA)方面表现良好,但在具有小开放阅读框(sORF)的 RNA 方面表现不佳。在这里,我们提出了 DeepCPP(用于编码潜力预测的深度神经网络),这是一种用于 RNA 编码潜力预测的深度学习方法。在四个以前的数据集和六个在不同物种中构建的新数据集上进行的广泛评估表明,DeepCPP 优于其他最先进的方法,特别是在 sORF 类型数据上,通过提高超过 4.31%、37.24%和 5.89%的准确性,分别在新发现的人类、脊椎动物和昆虫数据上克服了 sORF mRNA 识别的瓶颈。此外,我们还揭示了不连续的 k-mer 以及我们新提出的核苷酸偏差和最小分布相似性特征选择方法在这个分类问题中起着关键作用。总的来说,DeepCPP 是一种有效的 RNA 编码潜力预测方法。

相似文献

1
DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction.DeepCPP:一种基于核苷酸偏差信息和最小分布相似性特征选择的深度神经网络,用于 RNA 编码潜力预测。
Brief Bioinform. 2021 Mar 22;22(2):2073-2084. doi: 10.1093/bib/bbaa039.
2
csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.csORF-finder:一种用于准确识别多物种编码短开放阅读框的有效集成学习框架。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac392.
3
Prediction of Long Non-Coding RNAs Based on Deep Learning.基于深度学习的长非编码 RNA 预测。
Genes (Basel). 2019 Apr 3;10(4):273. doi: 10.3390/genes10040273.
4
Mining for missed sORF-encoded peptides.挖掘缺失的短开放阅读框编码肽。
Expert Rev Proteomics. 2019 Mar;16(3):257-266. doi: 10.1080/14789450.2019.1571919. Epub 2019 Feb 13.
5
A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.深度递归神经网络发现复杂的生物学规则,以破译 RNA 蛋白编码潜力。
Nucleic Acids Res. 2018 Sep 19;46(16):8105-8113. doi: 10.1093/nar/gky567.
6
LncRNAnet: long non-coding RNA identification using deep learning.LncRNAnet:使用深度学习进行长非编码 RNA 鉴定。
Bioinformatics. 2018 Nov 15;34(22):3889-3897. doi: 10.1093/bioinformatics/bty418.
7
lncRNA_Mdeep: An Alignment-Free Predictor for Distinguishing Long Non-Coding RNAs from Protein-Coding Transcripts by Multimodal Deep Learning.lncRNA_Mdeep:一种基于多模态深度学习的无比对长非编码 RNA 与蛋白编码转录本区分预测器。
Int J Mol Sci. 2020 Jul 23;21(15):5222. doi: 10.3390/ijms21155222.
8
Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation.全球核糖体相关非编码 RNA 分析揭示了新的翻译调控模式。
Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):E10018-E10027. doi: 10.1073/pnas.1708433114. Epub 2017 Oct 30.
9
Class similarity network for coding and long non-coding RNA classification.用于编码 RNA 和长非编码 RNA 分类的类相似性网络。
BMC Bioinformatics. 2021 Dec 20;22(1):609. doi: 10.1186/s12859-021-04517-6.
10
In-depth characterization and identification of translatable lncRNAs.深入分析和鉴定可翻译的长链非编码 RNA。
Comput Biol Med. 2023 Sep;164:107243. doi: 10.1016/j.compbiomed.2023.107243. Epub 2023 Jul 8.

引用本文的文献

1
SORFPP: Enhancing rich sequence-driven information to identify SEPs based on fused framework on validation datasets.SORFPP:在验证数据集上基于融合框架增强丰富的序列驱动信息以识别SEP
PLoS One. 2025 Apr 28;20(4):e0320314. doi: 10.1371/journal.pone.0320314. eCollection 2025.
2
Popcorn: prediction of short coding and noncoding genomic sequences in prokaryotes.爆米花:原核生物中短编码和非编码基因组序列的预测
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf250.
3
Small ORFs, Big Insights: as a Model to Unraveling Microprotein Functions.
小开放阅读框,大发现:作为揭示微小蛋白质功能的模型。
Cells. 2024 Oct 2;13(19):1645. doi: 10.3390/cells13191645.
4
misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.misORFPred:一种利用增强型可扩展k-mer和动态集成投票策略挖掘植物初级微小RNA中可翻译小开放阅读框的新方法。
Interdiscip Sci. 2025 Mar;17(1):114-133. doi: 10.1007/s12539-024-00661-8. Epub 2024 Oct 14.
5
PSPI: A deep learning approach for prokaryotic small protein identification.PSPI:一种用于原核小蛋白识别的深度学习方法。
Front Genet. 2024 Jul 10;15:1439423. doi: 10.3389/fgene.2024.1439423. eCollection 2024.
6
Current understanding of functional peptides encoded by lncRNA in cancer.目前对lncRNA编码的功能性肽在癌症中的理解。
Cancer Cell Int. 2024 Jul 19;24(1):252. doi: 10.1186/s12935-024-03446-7.
7
A survey of experimental and computational identification of small proteins.小蛋白的实验和计算鉴定综述。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae345.
8
sOCP: a framework predicting smORF coding potential based on TIS and in-frame features and effectively applied in the human genome.sOCP:一种基于 TIS 和框内特征预测 smORF 编码潜能的框架,并有效地应用于人类基因组。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae147.
9
No country for old methods: New tools for studying microproteins.旧方法的时代不再:研究微蛋白的新工具
iScience. 2024 Jan 20;27(2):108972. doi: 10.1016/j.isci.2024.108972. eCollection 2024 Feb 16.
10
A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder.基于卷积自动编码器的 RNA 及其相关相互作用的特定任务编码算法。
Nucleic Acids Res. 2023 Nov 27;51(21):e110. doi: 10.1093/nar/gkad929.