Milanesi Luciano, Rogozin Igor B
CNR-ITB, Via Frotelli Corni, 93, Segrate, MI 20090, Italy.
IEEE Trans Nanobioscience. 2003 Jun;2(2):75-8. doi: 10.1109/tnb.2003.813928.
The completion of a number of large genome sequencing projects emphasizes the importance of protein-coding gene predictions. Most of the problems associated with gene prediction are caused by the complex exon-intron structures commonly found in eukaryotic genomes. However, information from homologous sequences can significantly improve the accuracy of the prediction. In particular, expressed sequence tags (ESTs) are very useful for this purpose, since currently existing EST collections are very large. We developed an ESTMAP system, which utilizes homology searches against a database of repetitive elements using the RepeatView program and the EST Division of GenBank using the BLASTN program. ESTMAP extracts "exact" matches with EST sequences (> 95% of homology) from BLASTN output file and predicts introns in DNA comparing ESTs and a query sequence. ESTMAP is implemented as a part of the WebGene system (http://www.cnr.it/webgene).
多个大型基因组测序项目的完成凸显了蛋白质编码基因预测的重要性。与基因预测相关的大多数问题是由真核生物基因组中常见的复杂外显子 - 内含子结构引起的。然而,来自同源序列的信息可以显著提高预测的准确性。特别是,表达序列标签(EST)在这方面非常有用,因为目前现有的EST文库非常庞大。我们开发了一个ESTMAP系统,该系统利用RepeatView程序针对重复元件数据库进行同源性搜索,并利用BLASTN程序针对GenBank的EST分区进行同源性搜索。ESTMAP从BLASTN输出文件中提取与EST序列的“精确”匹配(同源性> 95%),并通过比较EST和查询序列来预测DNA中的内含子。ESTMAP作为WebGene系统(http://www.cnr.it/webgene)的一部分来实现。