Celniker Susan E, Wheeler David A, Kronmiller Brent, Carlson Joseph W, Halpern Aaron, Patel Sandeep, Adams Mark, Champe Mark, Dugan Shannon P, Frise Erwin, Hodgson Ann, George Reed A, Hoskins Roger A, Laverty Todd, Muzny Donna M, Nelson Catherine R, Pacleb Joanne M, Park Soo, Pfeiffer Barret D, Richards Stephen, Sodergren Erica J, Svirskas Robert, Tabor Paul E, Wan Kenneth, Stapleton Mark, Sutton Granger G, Venter Craig, Weinstock George, Scherer Steven E, Myers Eugene W, Gibbs Richard A, Rubin Gerald M
Berkeley Drosophila Genome Project, Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
Genome Biol. 2002;3(12):RESEARCH0079. doi: 10.1186/gb-2002-3-12-research0079. Epub 2002 Dec 23.
The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.
The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.
果蝇基因组是首个通过全基因组鸟枪法(WGS)测序的后生动物基因组。基因组学界广泛讨论了与这一成果相关的两个问题:就碱基对(bp)准确性和组装错误频率而言,该序列的正确性如何?以及,将WGS序列提升至已接受的完成序列标准有多困难?我们现在能够回答这些问题。
我们的完成过程旨在填补缺口、提高序列质量并验证组装。从WGS和单个细菌人工染色体(BAC)的草图测序中获得的序列痕迹被组装成BAC大小的片段。这些片段被提升至高质量,然后连接起来构成每个染色体臂的序列。通过与指纹BAC克隆的物理图谱进行比较来验证整体组装。在当前版本的116.9 Mb常染色质基因组(称为版本3)中,六条常染色质染色体臂由13个支架表示,共有37个序列缺口。我们将版本3与版本2进行了比较;在独特序列的常染色体区域,版本2的错误率为每20,000 bp中有一个错误。
WGS策略能够高效地产生后生动物基因组的高质量序列,同时生成完成序列所需的试剂。然而,最初的重复序列组装方法存在缺陷。我们在此报告的序列版本3是分子遗传实验和计算分析的可靠资源。