Mortlock Douglas P, Guenther Catherine, Kingsley David M
Department of Developmental Biology and Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California 94305-5329, USA.
Genome Res. 2003 Sep;13(9):2069-81. doi: 10.1101/gr.1306003. Epub 2003 Aug 12.
Regulatory sequences in higher genomes can map large distances from gene coding regions, and cannot yet be identified by simple inspection of primary DNA sequence information. Here we describe an efficient method of surveying large genomic regions for gene regulatory information, and subdividing complex sets of distant regulatory elements into smaller intervals for detailed study. The mouse Gdf6 gene is expressed in a number of distinct embryonic locations that are involved in the patterning of skeletal and soft tissues. To identify sequences responsible for Gdf6 regulation, we first isolated a series of overlapping bacterial artificial chromosomes (BACs) that extend varying distances upstream and downstream of the gene. A LacZ reporter cassette was integrated into the Gdf6 transcription unit of each BAC using homologous recombination in bacteria. Each modified BAC was injected into fertilized mouse eggs, and founder transgenic embryos were analyzed for LacZ expression mid-gestation. The overlapping segments defined by the BAC clones revealed five separate regulatory regions that drive LacZ expression in 11 distinct anatomical locations. To further localize sequences that control expression in developing skeletal joints, we created a series of BAC constructs with precise deletions across a putative joint-control region. This approach further narrowed the critical control region to an area containing several stretches of sequence that are highly conserved between mice and humans. A distant 2.9-kilobase fragment containing the highly conserved regions is able to direct very specific expression of a minimal promoter/LacZ reporter in proximal limb joints. These results demonstrate that even distant, complex regulatory sequences can be identified using a combination of BAC scanning, BAC deletion, and comparative sequencing approaches.
高等基因组中的调控序列可能距离基因编码区很远,仅通过简单检查原始DNA序列信息尚无法识别。在此,我们描述了一种高效的方法,用于在大型基因组区域中探寻基因调控信息,并将复杂的远距离调控元件集细分为较小的区间进行详细研究。小鼠Gdf6基因在一些不同的胚胎位置表达,这些位置参与骨骼和软组织的模式形成。为了鉴定负责Gdf6调控的序列,我们首先分离出一系列重叠的细菌人工染色体(BAC),这些BAC在基因的上游和下游延伸不同的距离。使用细菌中的同源重组将LacZ报告基因盒整合到每个BAC的Gdf6转录单元中。将每个修饰的BAC注射到受精的小鼠卵中,并在妊娠中期分析转基因奠基胚胎的LacZ表达。由BAC克隆定义的重叠片段揭示了五个独立的调控区域,这些区域在11个不同的解剖位置驱动LacZ表达。为了进一步定位控制发育中的骨骼关节表达的序列,我们创建了一系列BAC构建体,这些构建体在一个假定的关节控制区域上有精确的缺失。这种方法将关键控制区域进一步缩小到一个包含小鼠和人类之间高度保守的几段序列的区域。一个包含高度保守区域的2.9千碱基的远距离片段能够在近端肢体关节中指导最小启动子/LacZ报告基因的非常特异性的表达。这些结果表明,即使是远距离、复杂的调控序列也可以通过BAC扫描、BAC缺失和比较测序方法的组合来鉴定。