利用多种证据来源进行计算基因预测。

Computational gene prediction using multiple sources of evidence.

作者信息

Allen Jonathan E, Pertea Mihaela, Salzberg Steven L

机构信息

The Institute for Genomic Research, Rockville, Maryland 20850, USA.

出版信息

Genome Res. 2004 Jan;14(1):142-8. doi: 10.1101/gr.1562804.

DOI:10.1101/gr.1562804

PMID:14707176

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC314291/

Abstract

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

摘要

本文描述了一种计算方法，该方法通过使用从各种不同来源生成的证据来构建基因模型，这些来源包括基因组注释流程中的典型来源。该程序名为Combiner，它将基因组序列以及来自从头基因预测工具、蛋白质序列比对、表达序列标签和cDNA比对、剪接位点预测及其他证据的基因预测位置作为输入。在Combiner中实现了三种不同的证据组合算法，并在拟南芥的1783个已确认基因上进行了测试。我们的结果表明，组合基因预测证据始终优于甚至最好的单个基因预测工具，并且在某些情况下，可以在敏感性和特异性方面产生显著提升。

相似文献

Computational gene prediction using multiple sources of evidence.

Genome Res. 2004 Jan;14(1):142-8. doi: 10.1101/gr.1562804.

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.

J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.

Integrating alternative splicing detection into gene prediction.

BMC Bioinformatics. 2005 Feb 10;6:25. doi: 10.1186/1471-2105-6-25.

Computational modeling of gene structure in Arabidopsis thaliana.

Plant Mol Biol. 2002 Jan;48(1-2):49-58.

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.

Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.

Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.

Genome Res. 2005 Apr;15(4):487-95. doi: 10.1101/gr.3176505.

JIGSAW: integration of multiple sources of evidence for gene prediction.

Bioinformatics. 2005 Sep 15;21(18):3596-603. doi: 10.1093/bioinformatics/bti609. Epub 2005 Aug 2.

Computational discovery of internal micro-exons.

Genome Res. 2003 Jun;13(6A):1216-21. doi: 10.1101/gr.677503.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Evidence-based gene predictions in plant genomes.

Genome Res. 2009 Oct;19(10):1912-23. doi: 10.1101/gr.088997.108. Epub 2009 Jun 18.

引用本文的文献

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.

A chromosome-level genome assembly for the amphibious plant Rorippa aquatica reveals its allotetraploid origin and mechanisms of heterophylly upon submergence.

Commun Biol. 2024 Apr 18;7(1):431. doi: 10.1038/s42003-024-06088-7.

A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024.

Gene prediction in the immunoglobulin loci.

Genome Res. 2022 Jun;32(6):1152-1169. doi: 10.1101/gr.276676.122. Epub 2022 May 11.

TSEBRA: transcript selector for BRAKER.

BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.

Accurate prediction of -regulatory modules reveals a prevalent regulatory genome of humans.

NAR Genom Bioinform. 2021 Jun 17;3(2):lqab052. doi: 10.1093/nargab/lqab052. eCollection 2021 Jun.

Using intron position conservation for homology-based gene prediction.

Nucleic Acids Res. 2016 May 19;44(9):e89. doi: 10.1093/nar/gkw092. Epub 2016 Feb 17.

High speed BLASTN: an accelerated MegaBLAST search tool.

Nucleic Acids Res. 2015 Sep 18;43(16):7762-8. doi: 10.1093/nar/gkv784. Epub 2015 Aug 6.

IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy.

BMC Genomics. 2015 Feb 26;16(1):134. doi: 10.1186/s12864-015-1315-9.

WISCOD: a statistical web-enabled tool for the identification of significant protein coding regions.

Biomed Res Int. 2014;2014:282343. doi: 10.1155/2014/282343. Epub 2014 Sep 15.

本文引用的文献

Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map.

Genome Res. 2003 Jan;13(1):46-54. doi: 10.1101/gr.830003.

GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

Genome Res. 2002 Sep;12(9):1418-27. doi: 10.1101/gr.149502.

Computational prediction of eukaryotic protein-coding genes.

Nat Rev Genet. 2002 Sep;3(9):698-709. doi: 10.1038/nrg890.

Databases and tools for browsing genomes.

Annu Rev Genomics Hum Genet. 2002;3:293-310. doi: 10.1146/annurev.genom.3.030502.101529. Epub 2002 Apr 15.

Improving gene recognition accuracy by combining predictions from two gene-finding programs.

Bioinformatics. 2002 Aug;18(8):1034-45. doi: 10.1093/bioinformatics/18.8.1034.

Full-length messenger RNA sequences greatly improve genome annotation.

Genome Biol. 2002;3(6):RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. Epub 2002 May 30.

Computational gene finding in plants.

Plant Mol Biol. 2002 Jan;48(1-2):39-48.

A Bayesian framework for combining gene predictions.

Bioinformatics. 2002 Jan;18(1):19-27. doi: 10.1093/bioinformatics/18.1.19.

Computational inference of homologous gene structures in the human genome.

Genome Res. 2001 May;11(5):803-16. doi: 10.1101/gr.175701.

GeneSplicer: a new computational method for splice site prediction.

Nucleic Acids Res. 2001 Mar 1;29(5):1185-90. doi: 10.1093/nar/29.5.1185.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用多种证据来源进行计算基因预测。

Computational gene prediction using multiple sources of evidence.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用多种证据来源进行计算基因预测。

Computational gene prediction using multiple sources of evidence.

作者信息

机构信息

出版信息