Crabtree J, Wiltshire T, Brunk B, Zhao S, Schug J, Stoeckert C J, Bucan M
Center for Bioinformatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Genome Res. 2001 Oct;11(10):1746-57. doi: 10.1101/gr.195101.
The current strategy for sequencing the mouse genome involves the combination of a whole-genome shotgun approach with clone-based sequencing. High-resolution physical maps will provide a foundation for assembling contiguous segments of sequence. We have established a bacterial artificial chromosome (BAC)-based map of a 5-Mb region on mouse Chromosome 5, encompassing three gene families: receptor tyrosine kinases (PdgfraKit-Kdr), nonreceptor protein-tyrosine type kinases (Tec-Txk), and type-A receptors for the neurotransmitter GABA (Gabra2, Gabrb1, Gabrg1, and Gabra4). The construction of a BAC contig was initiated by hybridization screening the C57BL/6J (RPCI-23) BAC library, using known genes and sequence tagged sites (STSs). Additional overlapping clones were identified by searching the database of available restriction fingerprints for the RPCI-23 and RPCI-24 libraries. This effort resulted in the selection of >600 BAC clones, 251 kb of BAC-end sequences, and the placement of 40 known and/or predicted genes within this 5-Mb region. We use this high-resolution map to illustrate the integration of the BAC fingerprint map with a radiation-hybrid map via assembled expressed sequence tags (ESTs). From annotation of three representative BAC clones we demonstrate that up to 98% of the draft sequence for each contig could be ordered and oriented using known genes, BAC ends, consensus sequences for transcript assemblies, and comparisons with orthologous human sequence. For functional studies, annotation of sequence fragments as they are assembled into 50-200-kb stretches will be remarkably valuable.
当前对小鼠基因组进行测序的策略涉及将全基因组鸟枪法与基于克隆的测序相结合。高分辨率物理图谱将为组装连续的序列片段提供基础。我们已经构建了基于细菌人工染色体(BAC)的小鼠5号染色体上一个5兆碱基区域的图谱,该区域包含三个基因家族:受体酪氨酸激酶(Pdgfra、Kit、Kdr)、非受体蛋白酪氨酸激酶(Tec、Txk)以及神经递质GABA的A型受体(Gabra2、Gabrb1、Gabrg1和Gabra4)。通过使用已知基因和序列标签位点(STS)对C57BL/6J(RPCI - 23)BAC文库进行杂交筛选,启动了BAC重叠群的构建。通过搜索RPCI - 23和RPCI - 24文库的可用限制性指纹数据库,鉴定出了其他重叠克隆。这项工作最终筛选出了600多个BAC克隆、251千碱基的BAC末端序列,并在这个5兆碱基区域内定位了40个已知和/或预测的基因。我们利用这张高分辨率图谱来说明通过组装的表达序列标签(EST)将BAC指纹图谱与辐射杂种图谱进行整合。通过对三个代表性BAC克隆的注释,我们证明,利用已知基因、BAC末端、转录本组装的共有序列以及与直系同源人类序列的比较,每个重叠群的草图序列中高达98%的序列可以进行排序和定向。对于功能研究而言将序列片段注释成50 - 200千碱基的片段非常有价值。