Marra M, Kucaba T, Sekhon M, Hillier L, Martienssen R, Chinwalla A, Crockett J, Fedele J, Grover H, Gund C, McCombie W R, McDonald K, McPherson J, Mudd N, Parnell L, Schein J, Seim R, Shelby P, Waterston R, Wilson R
Washington University Genome Sequencing Center, St Louis, Missouri 63108, USA.
Nat Genet. 1999 Jul;22(3):265-70. doi: 10.1038/10327.
Arabidopsis thaliana has emerged as a model system for studies of plant genetics and development, and its genome has been targeted for sequencing by an international consortium (the Arabidopsis Genome Initiative; http://genome-www. stanford.edu/Arabidopsis/agi.html). To support the genome-sequencing effort, we fingerprinted more than 20,000 BACs (ref. 2) from two high-quality publicly available libraries, generating an estimated 17-fold redundant coverage of the genome, and used the fingerprints to nucleate assembly of the data by computer. Subsequent manual revision of the assemblies resulted in the incorporation of 19,661 fingerprinted BACs into 169 ordered sets of overlapping clones ('contigs'), each containing at least 3 clones. These contigs are ideal for parallel selection of BACs for large-scale sequencing and have supported the generation of more than 5.8 Mb of finished genome sequence submitted to GenBank; analysis of the sequence has confirmed the integrity of contigs constructed using this fingerprint data. Placement of contigs onto chromosomes can now be performed, and is being pursued by groups involved in both sequencing and positional cloning studies. To our knowledge, these data provide the first example of whole-genome random BAC fingerprint analysis of a eucaryote, and have provided a model essential to efforts aimed at generating similar databases of fingerprint contigs to support sequencing of other complex genomes, including that of human.
拟南芥已成为植物遗传学和发育研究的模式系统,其基因组已被一个国际财团(拟南芥基因组计划;http://genome-www.stanford.edu/Arabidopsis/agi.html)选定进行测序。为支持基因组测序工作,我们对来自两个高质量公开可用文库的20000多个细菌人工染色体(BAC)进行了指纹分析(参考文献2),生成了估计为基因组17倍冗余覆盖度的数据,并利用这些指纹数据通过计算机对数据进行成核组装。随后对组装结果进行人工修订,将19661个有指纹的BAC纳入169个有序的重叠克隆集(“重叠群”),每个重叠群至少包含3个克隆。这些重叠群非常适合用于大规模测序的BAC的平行选择,并支持向GenBank提交了超过5.8 Mb的完成基因组序列;对该序列的分析证实了使用此指纹数据构建的重叠群的完整性。现在可以将重叠群定位到染色体上,参与测序和定位克隆研究的团队正在进行此项工作。据我们所知,这些数据提供了真核生物全基因组随机BAC指纹分析的首个实例,并为旨在生成类似指纹重叠群数据库以支持包括人类基因组在内的其他复杂基因组测序的工作提供了一个重要模型。