通过结合来自多个注释引擎的基因预测来减少人工注释，以起始密码子预测为例。

Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

机构信息

Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands.

出版信息

PLoS One. 2013 May 10;8(5):e63523. doi: 10.1371/journal.pone.0063523. Print 2013.

DOI:10.1371/journal.pone.0063523

PMID:23675487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3651085/

Abstract

Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.

摘要

如今，原核基因组的测序速度快于手动基因注释的能力。自动化基因组注释引擎为用户提供了一种直接而完整的解决方案，用于预测 ORF 坐标和功能。因此，对于许多实验室来说，使用 AGE 对于减少注释给定原核基因组所需的时间是必不可少的。然而，AGE 提供不同的、有时甚至相互冲突的预测并不罕见。组合多个 AGE 可能会允许更准确的预测。在这里，我们根据来自不同细菌物种的 8 个菌株的经过 curated 的基因组注释，分析了不同 AGE 的从头开始的开放阅读框 (ORF) 调用性能，这些菌株的 GC% 范围从 35-52%。我们提出了一个案例研究，展示了一种使用预定义顺序（或路径）组合 AGE 来预测 ORF 起始密码子的比较基因组注释的新方法。AGE 组合的顺序是从高特异性到低特异性，特异性是基于这 8 个基因组注释的。对于每个 AGE 组合，我们都能够得出所谓的投影置信值，它是基于 8 个基因组的 ORF 起始密码子预测的平均特异性。投影置信度使我们能够估计特定 AGE 组合对特定 ORF 起始密码子的正确预测的可能性，从而确定那些难以预测起始密码子的 ORF。我们能够正确预测 90.5±4.8%（基于 8 个基因组）的基因的起始密码子，准确率为 81.1±7.6%。我们的共识路径方法与多数投票（9.7±4.4%）相比有了显著的改进，并且通过最优路径，在保持高特异性的同时获得了 ORF 起始预测的敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c515/3651085/05d5c08f5bb4/pone.0063523.g001.jpg

相似文献

Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.通过结合来自多个注释引擎的基因预测来减少人工注释，以起始密码子预测为例。

PLoS One. 2013 May 10;8(5):e63523. doi: 10.1371/journal.pone.0063523. Print 2013.

Large-scale prokaryotic gene prediction and comparison to genome annotation.大规模原核生物基因预测及与基因组注释的比较。

Bioinformatics. 2005 Dec 15;21(24):4322-9. doi: 10.1093/bioinformatics/bti701. Epub 2005 Oct 25.

Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.通过“逐帧”算法寻找原核生物基因：靶向基因起始位点和重叠基因。

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.

VIGOR, an annotation program for small viral genomes.VIGOR，一个小型病毒基因组注释程序。

BMC Bioinformatics. 2010 Sep 7;11:451. doi: 10.1186/1471-2105-11-451.

Achieving Accurate Sequence and Annotation Data for Caulobacter vibrioides CB13.获取新月柄杆菌CB13的准确序列和注释数据。

Curr Microbiol. 2018 Dec;75(12):1642-1648. doi: 10.1007/s00284-018-1572-3. Epub 2018 Sep 26.

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.结核分枝杆菌基因组中转录起始位点的重新注释。

Tuberculosis (Edinb). 2013 Jan;93(1):18-25. doi: 10.1016/j.tube.2012.11.012. Epub 2012 Dec 26.

Correction of the Caulobacter crescentus NA1000 genome annotation.新月柄杆菌NA1000基因组注释的校正

PLoS One. 2014 Mar 12;9(3):e91668. doi: 10.1371/journal.pone.0091668. eCollection 2014.

Identification of Translation Start Sites in Bacterial Genomes.细菌基因组中翻译起始位点的鉴定

Methods Mol Biol. 2021;2252:27-55. doi: 10.1007/978-1-0716-1150-0_2.

Non-AUG start codons: Expanding and regulating the small and alternative ORFeome.非 AUG 起始密码子：扩展和调控小 ORF 和替代 ORFeome。

Exp Cell Res. 2020 Jun 1;391(1):111973. doi: 10.1016/j.yexcr.2020.111973. Epub 2020 Mar 21.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

引用本文的文献

Genomic and transcriptomic landscape of Escherichia coli BL21(DE3).大肠杆菌BL21(DE3)的基因组和转录组图谱

Nucleic Acids Res. 2017 May 19;45(9):5285-5293. doi: 10.1093/nar/gkx228.

Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits.西瓜细菌性果斑病菌I群菌株嗜酸菌M6基因组序列的见解

Front Microbiol. 2016 Apr 6;7:430. doi: 10.3389/fmicb.2016.00430. eCollection 2016.

Genomic and transcriptomic analysis of the streptomycin-dependent Mycobacterium tuberculosis strain 18b.链霉素依赖型结核分枝杆菌菌株18b的基因组和转录组分析。

BMC Genomics. 2016 Mar 5;17:190. doi: 10.1186/s12864-016-2528-2.

A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.一种用于微生物翻译起始位点注释的新型质量评估与校正程序。

PLoS One. 2015 Jul 23;10(7):e0133691. doi: 10.1371/journal.pone.0133691. eCollection 2015.

IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy.IPred——整合从头预测和基于证据的基因预测以提高预测准确性。

BMC Genomics. 2015 Feb 26;16(1):134. doi: 10.1186/s12864-015-1315-9.

eCAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains.eCAMBer：支持大规模比较分析多种细菌菌株的高效工具。

BMC Bioinformatics. 2014 Mar 5;15:65. doi: 10.1186/1471-2105-15-65.

本文引用的文献

The automatic annotation of bacterial genomes.细菌基因组的自动注释。

Brief Bioinform. 2013 Jan;14(1):1-12. doi: 10.1093/bib/bbs007. Epub 2012 Mar 9.

Complete resequencing and reannotation of the Lactobacillus plantarum WCFS1 genome.全面重测序和重新注释植物乳杆菌 WCFS1 基因组。

J Bacteriol. 2012 Jan;194(1):195-6. doi: 10.1128/JB.06275-11.

Genome majority vote improves gene predictions.基因组多数表决提高基因预测准确性。

PLoS Comput Biol. 2011 Nov;7(11):e1002284. doi: 10.1371/journal.pcbi.1002284. Epub 2011 Nov 17.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Genome (re-)annotation and open-source annotation pipelines.基因组（重新）注释与开源注释流程。

Microb Biotechnol. 2010 Jul;3(4):362-9. doi: 10.1111/j.1751-7915.2010.00191.x.

Combining gene prediction methods to improve metagenomic gene annotation.结合基因预测方法以提高宏基因组基因注释。

BMC Bioinformatics. 2011 Jan 13;12:20. doi: 10.1186/1471-2105-12-20.

Benchmarking of gene prediction programs for metagenomic data.宏基因组数据基因预测程序的基准测试。

Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:6190-3. doi: 10.1109/IEMBS.2010.5627744.

PSORTdb--an expanded, auto-updated, user-friendly protein subcellular localization database for Bacteria and Archaea.PSORTdb——一个经过扩展、自动更新且用户友好的用于细菌和古菌的蛋白质亚细胞定位数据库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D241-4. doi: 10.1093/nar/gkq1093. Epub 2010 Nov 10.

Prodigal: prokaryotic gene recognition and translation initiation site identification.普罗迪格：原核基因识别和翻译起始位点鉴定。

BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119.

An Ergatis-based prokaryotic genome annotation web server.基于 Ergatis 的原核生物基因组注释网络服务器。

Bioinformatics. 2010 Apr 15;26(8):1122-4. doi: 10.1093/bioinformatics/btq090. Epub 2010 Mar 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过结合来自多个注释引擎的基因预测来减少人工注释，以起始密码子预测为例。

Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献