Gajer Pawel, Schatz Michael, Salzberg Steven L
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
Nucleic Acids Res. 2004 Jan 26;32(2):562-9. doi: 10.1093/nar/gkh216. Print 2004.
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.
通过使用基因组组装的信息,一个名为AutoEditor的新程序显著提高了碱基识别准确性,超过了之前算法所达到的水平。这进而提高了基因组序列的整体准确性,并促进了这些序列在多态性发现中的应用。我们描述了该算法及其在大量近期基因组测序项目中的应用。这些项目中错误碱基识别的数量减少了80%。在对超过一百万次校正的分析中,我们发现AutoEditor每8828次校正仅出现一次错误。通过大幅提高碱基识别的准确性,AutoEditor可以显著加速完成基因组的过程,这包括填补所有缺口并确保最终序列的最低质量标准。它还极大地提高了我们发现同一物种密切相关菌株和分离株之间单核苷酸多态性(SNP)的能力。