Allen Jonathan E, Pertea Mihaela, Salzberg Steven L
The Institute for Genomic Research, Rockville, Maryland 20850, USA.
Genome Res. 2004 Jan;14(1):142-8. doi: 10.1101/gr.1562804.
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.
本文描述了一种计算方法,该方法通过使用从各种不同来源生成的证据来构建基因模型,这些来源包括基因组注释流程中的典型来源。该程序名为Combiner,它将基因组序列以及来自从头基因预测工具、蛋白质序列比对、表达序列标签和cDNA比对、剪接位点预测及其他证据的基因预测位置作为输入。在Combiner中实现了三种不同的证据组合算法,并在拟南芥的1783个已确认基因上进行了测试。我们的结果表明,组合基因预测证据始终优于甚至最好的单个基因预测工具,并且在某些情况下,可以在敏感性和特异性方面产生显著提升。