Tchourbanov Alexandre, Quest Daniel, Ali Hesham, Pauley Mark, Norgren Robert
Department of Computer Science, University of Nebraska at Omaha, 68182-0116, USA.
Proc IEEE Comput Soc Bioinform Conf. 2003;2:353-62.
The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.
本文所解决的问题是,在使用人类基因组DNA与经过整理的mRNA转录本进行比对,精确识别/注释经生物学验证的核苷酸序列的外显子和内含子边界之后,进行准确且自动的基因注释。我们详细描述了一种新的cDNA/DNA同源性基因注释算法,该算法结合了BLASTN搜索结果和剪接比对结果。与目前使用的其他程序相比,通过基因组DNA序列的明确连接,注释质量得到了显著提高。我们还处理了具有非经典剪接位点和短外显子的基因注释问题。该方法已在Genie学习子集以及全规模人类RefSeq上进行了测试,并显示出高达97%的性能。