• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大规模原核生物基因预测及与基因组注释的比较。

Large-scale prokaryotic gene prediction and comparison to genome annotation.

作者信息

Nielsen Pernille, Krogh Anders

机构信息

Bioinformatics Centre, Institute of Molecular Biology and Physiology, University of Copenhagen Universitetsparken 15, 2100 Copenhagen, Denmark.

出版信息

Bioinformatics. 2005 Dec 15;21(24):4322-9. doi: 10.1093/bioinformatics/bti701. Epub 2005 Oct 25.

DOI:10.1093/bioinformatics/bti701
PMID:16249266
Abstract

MOTIVATION

Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene.

RESULTS

A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.

摘要

动机

原核生物基因组测序和注释的速度越来越快。不同测序团队的注释方法各不相同。这使得基因组比较变得困难,并且当可疑的注释从一个基因组应用到另一个基因组时,可能会导致错误的传播。使用单一的注释标准将有助于大规模或小规模的基因组比较,该标准应包含一个开放阅读框(ORF)被视为基因的原因的透明度。

结果

使用原核基因查找器EasyGene的更新版本对总共143个原核生物基因组进行了评分。将GenBank和RefSeq注释与EasyGene预测进行比较发现,在某些基因组中,高达约60%的基因可能被错误地注释了起始密码子,特别是在富含GC的基因组中。注释和预测之间的分数差异证实,许多生物体中注释了太多短基因。此外,一些基因组的注释中可能缺少基因。我们预测143个基因组中有41个被过度注释超过5%,这意味着太多的ORF被注释为基因。我们还预测143个基因组中有12个被注释不足。这些结果是基于EasyGene未找到的注释基因数量与GenBank中未注释的预测基因数量之间的差异。我们认为我们的标准化和全自动方法的平均性能略优于注释。

相似文献

1
Large-scale prokaryotic gene prediction and comparison to genome annotation.大规模原核生物基因预测及与基因组注释的比较。
Bioinformatics. 2005 Dec 15;21(24):4322-9. doi: 10.1093/bioinformatics/bti701. Epub 2005 Oct 25.
2
Missing genes in the annotation of prokaryotic genomes.原核生物基因组注释中缺失的基因。
BMC Bioinformatics. 2010 Mar 15;11:131. doi: 10.1186/1471-2105-11-131.
3
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
4
GeneLook: a novel ab initio gene identification system suitable for automated annotation of prokaryotic sequences.基因查找:一种适用于原核生物序列自动注释的新型从头基因识别系统。
Gene. 2005 Feb 14;346:115-25. doi: 10.1016/j.gene.2004.10.018. Epub 2005 Jan 26.
5
Biomediator data integration and inference for functional annotation of anonymous sequences.用于匿名序列功能注释的生物介质数据整合与推断
Pac Symp Biocomput. 2007:343-54.
6
EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance.EasyGene——一种通过统计显著性对开放阅读框(ORF)进行排名的原核生物基因查找工具。
BMC Bioinformatics. 2003 Jun 3;4:21. doi: 10.1186/1471-2105-4-21.
7
8
GenColors: accelerated comparative analysis and annotation of prokaryotic genomes at various stages of completeness.GenColors:加速不同完整度阶段原核生物基因组的比较分析与注释
Bioinformatics. 2005 Sep 15;21(18):3669-71. doi: 10.1093/bioinformatics/bti606. Epub 2005 Aug 2.
9
GenColors: annotation and comparative genomics of prokaryotes made easy.GenColors:原核生物的注释与比较基因组学变得轻松。
Methods Mol Biol. 2007;395:75-96.
10
MICheck: a web tool for fast checking of syntactic annotations of bacterial genomes.MICheck:一种用于快速检查细菌基因组句法注释的网络工具。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W471-9. doi: 10.1093/nar/gki498.

引用本文的文献

1
Comprehensive genome analysis of subsp. in camels from Saudi Arabia: Molecular epidemiology and antimicrobial resistance.沙特阿拉伯骆驼亚种的全基因组分析:分子流行病学与抗菌药物耐药性
Vet World. 2025 Apr;18(4):859-876. doi: 10.14202/vetworld.2025.859-876. Epub 2025 Apr 19.
2
In silico analysis of Ffp1, an ancestral Porphyromonas spp. fimbrillin, shows differences with Fim and Mfa.对祖先卟啉单胞菌属菌毛蛋白Ffp1的计算机模拟分析显示,其与菌毛蛋白(Fim)和微纤毛蛋白(Mfa)存在差异。
Access Microbiol. 2024 Jul 11;6(7). doi: 10.1099/acmi.0.000771.v3. eCollection 2024.
3
Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression.
按需定制的蛋白质:细菌 N 端蛋白表型表达的核糖体蛋白质组学探索。
mBio. 2024 Apr 10;15(4):e0033324. doi: 10.1128/mbio.00333-24. Epub 2024 Mar 21.
4
Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides.隐匿于众目睽睽之下:小开放阅读框编码多肽的蛋白质组学检测挑战
Microlife. 2022 May 14;3:uqac005. doi: 10.1093/femsml/uqac005. eCollection 2022.
5
Proteogenomic Analysis Provides Novel Insight into Genome Annotation and Nitrogen Metabolism in sp. PCC 7120.蛋白基因组分析为 sp. PCC 7120 的基因组注释和氮代谢提供了新的见解。
Microbiol Spectr. 2021 Oct 31;9(2):e0049021. doi: 10.1128/Spectrum.00490-21. Epub 2021 Sep 15.
6
ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs.ORFograph:在基因组和宏基因组组装图中搜索新型杀虫蛋白基因。
Microbiome. 2021 Jun 28;9(1):149. doi: 10.1186/s40168-021-01092-z.
7
The Phosin PptA Plays a Negative Role in the Regulation of Antibiotic Production in .磷蛋白PptA在[具体对象]抗生素生产调控中起负向作用。 (注:原文句末不完整,推测此处可能遗漏了某个具体的微生物名称或其他研究对象)
Antibiotics (Basel). 2021 Mar 20;10(3):325. doi: 10.3390/antibiotics10030325.
8
Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century.微生物基因组学的演变:二十五年来的概念转变。
Trends Microbiol. 2021 Jul;29(7):582-592. doi: 10.1016/j.tim.2021.01.005. Epub 2021 Feb 1.
9
Arginine-Rich Small Proteins with a Domain of Unknown Function, DUF1127, Play a Role in Phosphate and Carbon Metabolism of Agrobacterium tumefaciens.富含精氨酸的小蛋白,具有未知功能域 DUF1127,在根癌农杆菌的磷酸盐和碳代谢中发挥作用。
J Bacteriol. 2020 Oct 22;202(22). doi: 10.1128/JB.00309-20.
10
Achieving Accurate Sequence and Annotation Data for Caulobacter vibrioides CB13.获取新月柄杆菌CB13的准确序列和注释数据。
Curr Microbiol. 2018 Dec;75(12):1642-1648. doi: 10.1007/s00284-018-1572-3. Epub 2018 Sep 26.