Gordon D, Abajian C, Green P
Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195-7730, USA.
Genome Res. 1998 Mar;8(3):195-202. doi: 10.1101/gr.8.3.195.
Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.
大克隆或小基因组的测序通常采用鸟枪法(Anderson等人,1982年)。这包括两个阶段:(1)鸟枪阶段,从随机亚克隆中生成一些读段并组装成重叠群,接着是(2)定向或完成阶段,在此阶段检查组装的正确性以及各种数据异常情况(如污染读段、未去除的载体序列、嵌合或缺失读段),收集额外数据以填补缺口并解决低质量区域,然后进行编辑以纠正组装或碱基识别错误。目前,完成阶段是大规模测序工作的瓶颈,提高通量将既依赖于减少人工干预的需求,又要使其尽可能高效。我们开发了一个完成工具consed,它试图贯彻这些原则。相对于其他程序,一个显著特点是使用我们的程序phred和phrap生成的错误概率作为指导整个完成过程的客观标准。更多信息可在http:// www.genome.washington.edu/consed/consed.html获取。