Ashburner M, Misra S, Roote J, Lewis S E, Blazej R, Davis T, Doyle C, Galle R, George R, Harris N, Hartzell G, Harvey D, Hong L, Houston K, Hoskins R, Johnson G, Martin C, Moshrefi A, Palazzolo M, Reese M G, Spradling A, Tsang G, Wan K, Whitelaw K, Celniker S
Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, England.
Genetics. 1999 Sep;153(1):179-219. doi: 10.1093/genetics/153.1.179.
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species. Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926
从一系列重叠的P1和BAC克隆中对黑腹果蝇基因组中近3 Mb的连续序列进行了测序。该区域覆盖了2号染色体左臂上的69条染色体多线带,包括遗传特征明确的“乙醇脱氢酶区域”。对该序列的计算分析预测有218个蛋白质编码基因、11个tRNA和17个转座元件序列。至少38个蛋白质编码基因以2至6个紧密相关基因的簇形式排列,表明存在广泛的串联重复。基因密度为每13 kb有一个蛋白质编码基因;转座元件密度为每171 kb有一个元件。通过遗传分析在该区域鉴定出的73个基因中,有49个已定位到该序列上;P元件插入已定位到43个基因。已知和预测的基因中有95个(44%)与果蝇EST匹配,144个(66%)与其他生物中的蛋白质有明显相似性。已知有突变表型的基因比没有已知突变表型的基因更有可能在cDNA文库中出现,并且更有可能具有与其他生物蛋白质相似的产物。超过650个染色体畸变断点映射到该染色体区域,它们在遗传图谱上的非随机分布反映了DNA上基因间距的变化。这是对黑腹果蝇基因组在序列水平上的首次大规模分析。除了获得的直接结果外,该分析还使我们能够开发和测试解释该物种基因组完整序列所需的方法。在开始狩猎之前,明智的做法是在开始寻找之前先问问别人你在找什么。米尔恩1926年