通过跨物种比较重新排序候选基因模型以改进基因预测。

Reranking candidate gene models with cross-species comparison for improved gene prediction.

作者信息

Liu Qian, Crammer Koby, Pereira Fernando C N, Roos David S

机构信息

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

BMC Bioinformatics. 2008 Oct 14;9:433. doi: 10.1186/1471-2105-9-433.

DOI:10.1186/1471-2105-9-433

PMID:18854050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2587481/

Abstract

BACKGROUND

Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features.

RESULTS

We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+.

CONCLUSION

Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

摘要

背景

大多数基因预测工具通过结合局部特性（编码潜力、剪接供体和受体模式等），使用基于状态的方法（通常是隐马尔可夫模型）对候选基因模型进行评分。具有相似基于状态评分的竞争模型可能可以通过额外信息来区分。特别是，功能和比较基因组学数据集可能有助于通过利用可能与正确基因模型相关的特征（如保守的外显子/内含子结构或蛋白质序列特征）在具有可比概率的竞争模型中进行选择。

结果

我们研究了一种简单的后处理步骤在一组替代基因模型中进行选择的效用，使用全局评分规则对竞争模型重新排序以进行更准确的预测。对于每个基因座，我们首先使用基因预测工具Evigan生成K个最佳候选基因模型，然后通过与来自密切相关物种的假定直系同源基因进行比较对这些模型重新排序。如果候选基因模型在编码序列、剪接位点位置或信号肽出现方面与可能的直系同源物表现出强烈相似性，则可能会选择在原始基因预测工具中得分较低的候选基因模型。对黑腹果蝇的实验表明，基于跨物种比较的重新排序优于仅由Evigan识别的最佳基因模型，也优于比较基因预测工具GeneWise和Augustus+。

结论

通过跨物种比较对基因模型重新排序可提高基因预测准确性。这种直接的方法可以很容易地进行调整以纳入其他证据线索，因为它只需要一个候选基因模型的排序来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d618/2587481/de25b4191c70/1471-2105-9-433-1.jpg

相似文献

Reranking candidate gene models with cross-species comparison for improved gene prediction.通过跨物种比较重新排序候选基因模型以改进基因预测。

BMC Bioinformatics. 2008 Oct 14;9:433. doi: 10.1186/1471-2105-9-433.

Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction.Evigan：一种用于整合真核生物基因预测基因证据的隐藏变量模型。

Bioinformatics. 2008 Mar 1;24(5):597-605. doi: 10.1093/bioinformatics/btn004. Epub 2008 Jan 10.

Self-alignments to detect mutually exclusive exon usage.用于检测互斥外显子使用情况的自比对。

In Silico Biol. 2007;7(6):613-21.

Gene expression trends and protein features effectively complement each other in gene function prediction.基因表达趋势和蛋白质特征在基因功能预测中能有效互补。

Bioinformatics. 2009 Feb 1;25(3):322-30. doi: 10.1093/bioinformatics/btn625. Epub 2008 Dec 2.

Mol Biol Evol. 2009 Apr;26(4):859-66. doi: 10.1093/molbev/msp006. Epub 2009 Jan 15.

Detecting conserved coding genomic regions through signal processing of nucleotide substitution patterns.

Artif Intell Med. 2009 Feb-Mar;45(2-3):117-23. doi: 10.1016/j.artmed.2008.07.015. Epub 2008 Sep 21.

Combining multisource information through functional-annotation-based weighting: gene function prediction in yeast.通过基于功能注释的加权整合多源信息：酵母中的基因功能预测

IEEE Trans Biomed Eng. 2009 Feb;56(2):229-36. doi: 10.1109/TBME.2008.2005955. Epub 2008 Sep 30.

Gene structure prediction by linguistic methods.

Genomics. 1994 Oct;23(3):540-51. doi: 10.1006/geno.1994.1541.

Alport syndrome. Molecular genetic aspects.奥尔波特综合征。分子遗传学方面。

Dan Med Bull. 2009 Aug;56(3):105-52.

Inference of regulatory gene interactions from expression data using three-way mutual information.利用三元互信息从表达数据推断调控基因相互作用。

Ann N Y Acad Sci. 2009 Mar;1158:302-13. doi: 10.1111/j.1749-6632.2008.03757.x.

引用本文的文献

Automated alignment-based curation of gene models in filamentous fungi.基于自动化比对的丝状真菌基因模型校正。

BMC Bioinformatics. 2014 Jan 16;15:19. doi: 10.1186/1471-2105-15-19.

Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.评估高通量从头基因预测软件，以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。

PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.

Congratulations, you have been carefully chosen to represent an important developmental regulator!恭喜你，你已被精心挑选来代表一个重要的发育调节剂！

Ann Bot. 2013 Mar;111(3):329-33. doi: 10.1093/aob/mcs161. Epub 2012 Jul 18.

Automatic figure ranking and user interfacing for intelligent figure search.智能图像搜索的自动图像排序和用户界面。

PLoS One. 2010 Oct 7;5(10):e12983. doi: 10.1371/journal.pone.0012983.

Integrative genomic approaches highlight a family of parasite-specific kinases that regulate host responses.综合基因组方法强调了一组寄生虫特异性激酶，这些激酶调节宿主反应。

Cell Host Microbe. 2010 Aug 19;8(2):208-18. doi: 10.1016/j.chom.2010.07.004.

本文引用的文献

Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction.Evigan：一种用于整合真核生物基因预测基因证据的隐藏变量模型。

Bioinformatics. 2008 Mar 1;24(5):597-605. doi: 10.1093/bioinformatics/btn004. Epub 2008 Jan 10.

Assessing performance of orthology detection strategies applied to eukaryotic genomes.评估应用于真核生物基因组的直系同源检测策略的性能。

PLoS One. 2007 Apr 18;2(4):e383. doi: 10.1371/journal.pone.0000383.

JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.JIGSAW、GeneZilla和GlimmerHMM：解析ENCODE区域中人类基因的特征

Genome Biol. 2006;7 Suppl 1(Suppl 1):S9.1-13. doi: 10.1186/gb-2006-7-s1-s9. Epub 2006 Aug 7.

Vertebrate gene finding from multiple-species alignments using a two-level strategy.使用两级策略从多物种比对中寻找脊椎动物基因。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S6.1-12. doi: 10.1186/gb-2006-7-s1-s6. Epub 2006 Aug 7.

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS：利用EST、蛋白质和基因组比对改进人类基因组中的基因预测

Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.

Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释

Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

Reference based annotation with GeneMapper.使用基因分型仪进行基于参考的注释。

Genome Biol. 2006;7(4):R29. doi: 10.1186/gb-2006-7-4-r29. Epub 2006 Apr 5.

Using multiple alignments to improve gene prediction.使用多重比对来改进基因预测。

J Comput Biol. 2006 Mar;13(2):379-93. doi: 10.1089/cmb.2006.13.379.

The UCSC Known Genes.加州大学圣克鲁兹分校已知基因

Bioinformatics. 2006 May 1;22(9):1036-46. doi: 10.1093/bioinformatics/btl048. Epub 2006 Feb 24.

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources.使用来自外部源的提示，通过广义隐马尔可夫模型对真核生物进行基因预测。

BMC Bioinformatics. 2006 Feb 9;7:62. doi: 10.1186/1471-2105-7-62.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过跨物种比较重新排序候选基因模型以改进基因预测。

Reranking candidate gene models with cross-species comparison for improved gene prediction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献