评估高通量从头基因预测软件，以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。

Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.

机构信息

School of Medical and Molecular Sciences, and the Ithree Institute at the University of Technology Sydney-UTS, New South Wales, Australia.

出版信息

PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.

DOI:10.1371/journal.pone.0050609

PMID:23226328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3511556/

Abstract

Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen's genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers.

摘要

下一代测序技术正在以前所未有的水平推进基因组测序。通过揭示病原体基因组内的密码，理论上可以发现每一种可能的蛋白质（在翻译后修饰之前），无论生命周期阶段和环境刺激如何。现在比以往任何时候都更需要高通量的从头基因发现。从头基因发现者使用统计模型仅从基因组序列预测基因及其外显子-内含子结构。本文评估了现有的从头基因发现者是否可以有效地预测基因，以推断目前未被实验室技术捕获的蛋白质。其目的是确定基因发现者整体预测不准确的可能模式，而不考虑目标病原体。在评估中考虑了所有现有的从头基因发现者，但只有四个具有高通量能力：AUGUSTUS、GeneMark_hmm、GlimmerHMM 和 SNAP。这些基因发现者需要针对特定目标病原体的训练数据，因此评估结果与数据的可用性和质量密不可分。寄生虫刚地弓形虫被用来举例说明评估方法。结果支持当前的观点，即在缺乏实验证据的情况下，从头基因发现者预测的外显子是不准确的。然而，结果揭示了所有基因发现者都存在的一些常见的不准确模式，这些不准确模式可能为未来的基因发现者开发者提供一个关注领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/110d/3511556/6075aab6e199/pone.0050609.g001.jpg

相似文献

Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.评估高通量从头基因预测软件，以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。

PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.

TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.TigrScan和GlimmerHMM：两款开源的从头开始的真核生物基因预测工具。

Bioinformatics. 2004 Nov 1;20(16):2878-9. doi: 10.1093/bioinformatics/bth315. Epub 2004 May 14.

Computational methods for ab initio and comparative gene finding.从头预测和比较基因发现的计算方法。

Methods Mol Biol. 2010;609:269-84. doi: 10.1007/978-1-60327-241-4_16.

Genome-wide analyses reveal genes subject to positive selection in Toxoplasma gondii.全基因组分析揭示弓形虫中受正选择作用的基因。

Gene. 2019 May 30;699:73-79. doi: 10.1016/j.gene.2019.03.008. Epub 2019 Mar 9.

Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.Seqping：使用自训练基因模型和转录组数据的植物基因组基因预测流程

BMC Bioinformatics. 2017 Jan 27;18(Suppl 1):1426. doi: 10.1186/s12859-016-1426-6.

Computational analysis and experimental validation of gene predictions in Toxoplasma gondii.刚地弓形虫基因预测的计算分析与实验验证

PLoS One. 2008;3(12):e3899. doi: 10.1371/journal.pone.0003899. Epub 2008 Dec 9.

mGene: accurate SVM-based gene finding with an application to nematode genomes.mGene：基于 SVM 的精确基因预测方法及其在线虫基因组中的应用。

Genome Res. 2009 Nov;19(11):2133-43. doi: 10.1101/gr.090597.108. Epub 2009 Jun 29.

AUGUSTUS: ab initio prediction of alternative transcripts.奥古斯塔斯：可变转录本的从头预测。

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9. doi: 10.1093/nar/gkl200.

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。

Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.

Whole-genome sequencing of a Toxoplasma gondii strain from a Turkish isolate using next-generation sequencing technology.使用下一代测序技术对一株来自土耳其分离株的弓形虫进行全基因组测序。

Acta Trop. 2021 Jun;218:105907. doi: 10.1016/j.actatropica.2021.105907. Epub 2021 Mar 28.

引用本文的文献

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。

Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.

Comparative Genome Annotation.比较基因组注释。

Methods Mol Biol. 2024;2802:165-187. doi: 10.1007/978-1-0716-3838-5_7.

A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.一种新的基因发现工具GeneMark-ETP显著提高了大型真核生物基因组自动注释的准确性。

bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024.

Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes.理解真核生物蛋白质编码基因预测错误的原因：以灵长类蛋白质组为例。

BMC Bioinformatics. 2020 Nov 10;21(1):513. doi: 10.1186/s12859-020-03855-1.

Using AnABlast for intergenic sORF prediction in the Caenorhabditis elegans genome.使用 AnABlast 预测秀丽隐杆线虫基因组中的基因间 sORF。

Bioinformatics. 2020 Dec 8;36(19):4827-4832. doi: 10.1093/bioinformatics/btaa608.

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.不同真核生物中从头基因预测方法的基准研究。

BMC Genomics. 2020 Apr 9;21(1):293. doi: 10.1186/s12864-020-6707-9.

Ancient evolutionary signals of protein-coding sequences allow the discovery of new genes in the Drosophila melanogaster genome.蛋白质编码序列的古老进化信号可用于发现果蝇基因组中的新基因。

BMC Genomics. 2020 Mar 5;21(1):210. doi: 10.1186/s12864-020-6632-y.

Draft Genome of a Blister Beetle .一种芫菁的基因组草图

Front Genet. 2020 Jan 8;10:1281. doi: 10.3389/fgene.2019.01281. eCollection 2019.

Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models.全面的基因结构分析：自动预测和手动注释基因模型的比较案例研究。

BMC Genomics. 2019 Oct 17;20(1):753. doi: 10.1186/s12864-019-6064-8.

Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing.长读注释：基于长读 cDNA 测序的自动化真核基因组注释。

Plant Physiol. 2019 Jan;179(1):38-54. doi: 10.1104/pp.18.00848. Epub 2018 Nov 6.

本文引用的文献

Sequencing transcriptomes in toto.全转录组测序。

Integr Biol (Camb). 2011 May;3(5):522-8. doi: 10.1039/c0ib00062k. Epub 2011 Feb 4.

Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.鉴定人类编码序列中进化保守的非 AUG 起始的 N 端延伸。

Nucleic Acids Res. 2011 May;39(10):4220-34. doi: 10.1093/nar/gkr007. Epub 2011 Jan 25.

Computer aided selection of candidate vaccine antigens.计算机辅助筛选候选疫苗抗原。

Immunome Res. 2010 Nov 3;6 Suppl 2(Suppl 2):S1. doi: 10.1186/1745-7580-6-S2-S1.

Comprehensive proteomic analysis of membrane proteins in Toxoplasma gondii.弓形虫膜蛋白的综合蛋白质组学分析。

Mol Cell Proteomics. 2011 Jan;10(1):M110.000745. doi: 10.1074/mcp.M110.000745. Epub 2010 Oct 10.

An overview of the current status of eukaryote gene prediction strategies.真核生物基因预测策略的现状概述。

Gene. 2010 Aug 1;461(1-2):1-4. doi: 10.1016/j.gene.2010.04.008. Epub 2010 Apr 27.

Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs.通过寡聚帽cDNA的5'端单通道和全长序列揭示的顶复门寄生虫基因组注释的不一致性。

BMC Genomics. 2009 Jul 15;10:312. doi: 10.1186/1471-2164-10-312.

mGene.web: a web service for accurate computational gene finding.mGene.web：一个用于精确计算基因发现的网络服务。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W312-6. doi: 10.1093/nar/gkp479. Epub 2009 Jun 3.

Identifying protein-coding genes in genomic sequences.在基因组序列中识别蛋白质编码基因。

Genome Biol. 2009;10(1):201. doi: 10.1186/gb-2009-10-1-201. Epub 2009 Jan 30.

Genome-based approaches to develop vaccines against bacterial pathogens.基于基因组学的方法开发针对细菌病原体的疫苗。

Vaccine. 2009 May 26;27(25-26):3245-50. doi: 10.1016/j.vaccine.2009.01.072. Epub 2009 Feb 5.

Computational analysis and experimental validation of gene predictions in Toxoplasma gondii.刚地弓形虫基因预测的计算分析与实验验证

PLoS One. 2008;3(12):e3899. doi: 10.1371/journal.pone.0003899. Epub 2008 Dec 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估高通量从头基因预测软件，以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。

Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献