• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CodAn:用于精确识别真核转录本编码区域的预测模型。

CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts.

出版信息

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa045.

DOI:10.1093/bib/bbaa045
PMID:32460307
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8138839/
Abstract

MOTIVATION

Characterization of the coding sequences (CDSs) is an essential step in transcriptome annotation. Incorrect identification of CDSs can lead to the prediction of non-existent proteins that can eventually compromise knowledge if databases are populated with similar incorrect predictions made in different genomes. Also, the correct identification of CDSs is important for the characterization of the untranslated regions (UTRs), which are known to be important regulators of the mRNA translation process. Considering this, we present CodAn (Coding sequence Annotator), a new approach to predict confident CDS and UTR regions in full or partial transcriptome sequences in eukaryote species.

RESULTS

Our analysis revealed that CodAn performs confident predictions on full-length and partial transcripts with the strand sense of the CDS known or unknown. The comparative analysis showed that CodAn presents better overall performance than other approaches, mainly when considering the correct identification of the full CDS (i.e. correct identification of the start and stop codons). In this sense, CodAn is the best tool to be used in projects involving transcriptomic data.

AVAILABILITY

CodAn is freely available at https://github.com/pedronachtigall/CodAn.

CONTACT

aland@usp.br.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Briefings in Bioinformatics online.

摘要

动机

对编码序列(CDS)进行特征描述是转录组注释的关键步骤。如果在不同的基因组中做出了类似的错误预测并将其填充到数据库中,那么错误识别 CDS 可能会导致预测出不存在的蛋白质,最终会影响知识的完整性。此外,正确识别 CDS 对于非翻译区(UTR)的特征描述也很重要,因为 UTR 已知是 mRNA 翻译过程的重要调节因子。考虑到这一点,我们提出了 CodAn(编码序列注释器),这是一种用于预测真核生物全长或部分转录组序列中置信 CDS 和 UTR 区的新方法。

结果

我们的分析表明,CodAn 可以在已知或未知 CDS 链的全长和部分转录本上进行可靠的预测。比较分析表明,CodAn 比其他方法具有更好的整体性能,尤其是在正确识别全长 CDS 方面(即正确识别起始和终止密码子)。从这个意义上说,CodAn 是涉及转录组数据项目的最佳工具。

可用性

CodAn 可在 https://github.com/pedronachtigall/CodAn 上免费获得。

联系方式

aland@usp.br。

补充信息

补充材料可在Briefings in Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/48b5c01d0f81/bbaa045f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/a22099146185/bbaa045f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/01f2d52c89a4/bbaa045f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/2198ce912f26/bbaa045f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/6397965cbe31/bbaa045f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/1e3c89ee48eb/bbaa045f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/48b5c01d0f81/bbaa045f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/a22099146185/bbaa045f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/01f2d52c89a4/bbaa045f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/2198ce912f26/bbaa045f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/6397965cbe31/bbaa045f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/1e3c89ee48eb/bbaa045f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/8138839/48b5c01d0f81/bbaa045f6.jpg

相似文献

1
CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts.CodAn:用于精确识别真核转录本编码区域的预测模型。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa045.
2
Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes.Pinstripe:一套用于整合转录组和蛋白质组数据集的程序,可识别新的蛋白质,并提高蛋白质编码和非编码基因的区分能力。
Bioinformatics. 2012 Dec 1;28(23):3042-50. doi: 10.1093/bioinformatics/bts582. Epub 2012 Oct 7.
3
FRAMA: from RNA-seq data to annotated mRNA assemblies.FRAMA:从RNA测序数据到注释的mRNA组装体
BMC Genomics. 2016 Jan 14;17:54. doi: 10.1186/s12864-015-2349-8.
4
utr.annotation: a tool for annotating genomic variants that could influence post-transcriptional regulation.UTR 注释:一种注释可能影响转录后调控的基因组变异的工具。
Bioinformatics. 2021 Nov 5;37(21):3926-3928. doi: 10.1093/bioinformatics/btab635.
5
IdentiCS--identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence.IdentiCS——直接从未注释的低覆盖度细菌基因组序列中鉴定编码序列并进行代谢网络的计算机重建。
BMC Bioinformatics. 2004 Aug 16;5:112. doi: 10.1186/1471-2105-5-112.
6
uPEPperoni: an online tool for upstream open reading frame location and analysis of transcript conservation.uPEPperoni:一个用于上游开放阅读框定位和转录保守性分析的在线工具。
BMC Bioinformatics. 2014 Feb 1;15:36. doi: 10.1186/1471-2105-15-36.
7
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
8
Characterization of 954 bovine full-CDS cDNA sequences.954条牛全长编码序列(CDS)cDNA序列的特征分析
BMC Genomics. 2005 Nov 23;6:166. doi: 10.1186/1471-2164-6-166.
9
PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3' UTRs and coding sequences.PACCMIT/PACCMIT-CDS:识别3'非翻译区和编码序列中的微小RNA靶标
Nucleic Acids Res. 2015 Jul 1;43(W1):W474-9. doi: 10.1093/nar/gkv457. Epub 2015 May 6.
10
Full-length ribosome density prediction by a multi-input and multi-output model.基于多输入多输出模型的全长核糖体密度预测
PLoS Comput Biol. 2021 Mar 26;17(3):e1008842. doi: 10.1371/journal.pcbi.1008842. eCollection 2021 Mar.

引用本文的文献

1
Chromosome-Contiguous Reference Genome for to Underpin Future Discovery Research.用于支持未来发现研究的染色体连续参考基因组。
Int J Mol Sci. 2025 Jul 3;26(13):6417. doi: 10.3390/ijms26136417.
2
Cell wall modulation by drought and elevated CO in sugarcane leaves.干旱和高浓度二氧化碳对甘蔗叶片细胞壁的调节作用
Front Plant Sci. 2025 Apr 30;16:1567201. doi: 10.3389/fpls.2025.1567201. eCollection 2025.
3
Building better genome annotations across the tree of life.构建跨越生命之树的更优基因组注释。

本文引用的文献

1
De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.从头转录组组装:短读 RNA-Seq 组装器的全面跨物种比较。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.
2
CPPred: coding potential prediction based on the global description of RNA sequence.CPPred:基于 RNA 序列全局描述的编码潜能预测。
Nucleic Acids Res. 2019 May 7;47(8):e43. doi: 10.1093/nar/gkz087.
3
What Are 3' UTRs Doing?3' UTRs 是做什么的?
Genome Res. 2025 May 2;35(5):1261-1276. doi: 10.1101/gr.280377.124.
4
Technological breakthroughs and advancements in the application of mRNA vaccines: a comprehensive exploration and future prospects.mRNA疫苗应用中的技术突破与进展:全面探索及未来展望
Front Immunol. 2025 Mar 4;16:1524317. doi: 10.3389/fimmu.2025.1524317. eCollection 2025.
5
gymnotoa-db: a database and application to optimize functional annotation in gymnosperms.裸子植物数据库(gymnotoa-db):一个用于优化裸子植物功能注释的数据库及应用程序。
Database (Oxford). 2025 Mar 5;2025. doi: 10.1093/database/baaf019.
6
TIdeS: A Comprehensive Framework for Accurate Open Reading Frame Identification and Classification in Eukaryotic Transcriptomes.TIdeS:真核转录组中准确开放阅读框识别与分类的综合框架。
Genome Biol Evol. 2024 Dec 4;16(12). doi: 10.1093/gbe/evae252.
7
HPC-T-Annotator: an HPC tool for de novo transcriptome assembly annotation.HPC-T-Annotator:用于从头转录组组装注释的 HPC 工具。
BMC Bioinformatics. 2024 Aug 21;25(1):272. doi: 10.1186/s12859-024-05887-3.
8
Patterns of molecular evolution in a parthenogenic terrestrial isopod ().孤雌生殖陆生等足目动物()分子进化模式。
PeerJ. 2024 Jul 23;12:e17780. doi: 10.7717/peerj.17780. eCollection 2024.
9
A survey of experimental and computational identification of small proteins.小蛋白的实验和计算鉴定综述。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae345.
10
Stress-induced nuclear translocation of ONAC023 improves drought and heat tolerance through multiple processes in rice.应激诱导的 ONAC023 核转位通过多种途径提高水稻的抗旱耐热性。
Nat Commun. 2024 Jul 13;15(1):5877. doi: 10.1038/s41467-024-50229-9.
Cold Spring Harb Perspect Biol. 2019 Oct 1;11(10):a034728. doi: 10.1101/cshperspect.a034728.
4
Cellular stress alters 3'UTR landscape through alternative polyadenylation and isoform-specific degradation.细胞应激通过可变多聚腺苷酸化和异构体特异性降解改变 3'UTR 景观。
Nat Commun. 2018 Jun 11;9(1):2268. doi: 10.1038/s41467-018-04730-7.
5
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.BASiNET-生物序列 NETwork:一个关于编码和非编码 RNA 鉴定的案例研究。
Nucleic Acids Res. 2018 Sep 19;46(16):e96. doi: 10.1093/nar/gky462.
6
The exon-intron gene structure upstream of the initiation codon predicts translation efficiency.起始密码子上游的外显子-内含子基因结构可预测翻译效率。
Nucleic Acids Res. 2018 May 18;46(9):4575-4591. doi: 10.1093/nar/gky282.
7
APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data.APAtrap:从 RNA-seq 数据中鉴定和定量分析可变多聚腺苷酸化位点。
Bioinformatics. 2018 Jun 1;34(11):1841-1849. doi: 10.1093/bioinformatics/bty029.
8
ExUTR: a novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data.ExUTR:一种从 NGS 数据中大规模预测 3'-UTR 序列的新管道。
BMC Genomics. 2017 Nov 6;18(1):847. doi: 10.1186/s12864-017-4241-1.
9
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.
10
Translational control by 5'-untranslated regions of eukaryotic mRNAs.真核生物 mRNAs 5'-非翻译区的翻译调控。
Science. 2016 Jun 17;352(6292):1413-6. doi: 10.1126/science.aad9868.