Iseli C, Jongeneel C V, Bucher P
Swiss Institute of Bioinformatics, Epalinges, Switzerland.
Proc Int Conf Intell Syst Mol Biol. 1999:138-48.
One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.
与未注释的低质量EST序列大规模分析相关的问题之一是编码区的检测以及对它们经常包含的移码错误的校正。我们引入了一种新型的隐马尔可夫模型,该模型明确处理待分析序列中出现错误的可能性,并纳入了一种校正这些错误的方法。该模型在一个高效且强大的程序ESTScan中得以实现。我们表明,ESTScan能够以高选择性和敏感性从低质量序列中检测并提取编码区,并且能够准确校正移码错误。在基因组测序项目的框架下,ESTScan可能会成为基因发现、质量控制以及代表基因编码区的重叠群组装方面非常有用的工具。