Coghlan Avril, Durbin Richard
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Bioinformatics. 2007 Jun 15;23(12):1468-75. doi: 10.1093/bioinformatics/btm133. Epub 2007 May 5.
Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.
We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.
Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix
正确的基因预测对于大多数基因组分析至关重要。然而,在缺乏转录本数据的情况下,基因预测仍然具有挑战性。提高此类基因组中基因发现准确性的一种方法是将多个基因预测工具预测的外显子进行组合,这样做出不相关错误的基因预测工具可以相互校正。
我们提出了一种名为Genomix的基因预测工具组合方法。Genomix根据序列和内含子-外显子结构选择在物种内部和/或物种之间保守性最佳的预测外显子,并将它们组合成一个基因结构。通过选择与秀丽隐杆线虫、布氏秀丽线虫和雷曼氏秀丽线虫保守性最佳的预测外显子,Genomix被用于组合来自四种基因预测工具对秀丽隐杆线虫的预测。在一组约1500个已确认的秀丽隐杆线虫基因上,与最佳的输入基因预测工具相比,Genomix将外显子水平的特异性提高了10.1%,敏感性提高了2.7%。
脚本和补充材料可在http://www.sanger.ac.uk/Software/analysis/genomix获取。