Benos P V, Gatt M K, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demaille J, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Borkova D, Miñana B, Kafatos F C, Bolshakov S, Sidén-Kiamos I, Papagiannakis G, Spanos L, Louis C, Madueño E, de Pablos B, Modolell J, Peter A, Schöttler P, Werner M, Mourkioti F, Beinert N, Dowe G, Schäfer U, Jäckle H, Bucheton A, Callister D, Campbell L, Henderson N S, McMillan P J, Salles C, Tait E, Valenti P, Saunders R D, Billaud A, Pachter L, Glover D M, Ashburner M
EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Genome Res. 2001 May;11(5):710-30. doi: 10.1101/gr.173801.
We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94 had been sequenced already in the course of studying the biology of their gene products, and examples of 12 different transposable elements. We show that an interval between bands 3A2 and 3C2, believed in the 1970s to show a correlation between the number of bands on the polytene chromosomes and the 20 genes identified by conventional genetics, is predicted to contain 45 genes from its DNA sequence. We have determined the insertion sites of P-elements from 111 mutant lines, about half of which are in a position likely to affect the expression of novel predicted genes, thus representing a resource for subsequent functional genomic analysis. We compare the European Drosophila Genome Project sequence with the corresponding part of the independently assembled and annotated Joint Sequence determined through "shotgun" sequencing. Discounting differences in the distribution of known transposable elements between the strains sequenced in the two projects, we detected three major sequence differences, two of which are probably explained by errors in assembly; the origin of the third major difference is unclear. In addition there are eight sequence gaps within the Joint Sequence. At least six of these eight gaps are likely to be sites of transposable elements; the other two are complex. Of the 275 genes in common to both projects, 60% are identical within 1% of their predicted amino-acid sequence and 31% show minor differences such as in choice of translation initiation or termination codons; the remaining 9% show major differences in interpretation.
我们展示了从黑腹果蝇X染色体末端延伸的一段连续2.63 Mb的DNA序列。在这段序列中,我们预测有277个蛋白质编码基因,其中94个在研究其基因产物生物学过程中已被测序,还有12种不同转座元件的实例。我们发现,在20世纪70年代人们认为在多线染色体上的带纹数量与通过传统遗传学鉴定的20个基因之间存在关联的3A2和3C2带之间的区间,根据其DNA序列预测含有45个基因。我们确定了来自111个突变系的P因子插入位点,其中约一半位于可能影响新预测基因表达的位置,因此为后续功能基因组分析提供了资源。我们将欧洲果蝇基因组计划的序列与通过“鸟枪法”测序独立组装和注释的联合序列的相应部分进行了比较。不考虑两个项目测序的菌株之间已知转座元件分布的差异,我们检测到三个主要序列差异,其中两个可能是由组装错误造成的;第三个主要差异的来源尚不清楚。此外,联合序列中有八个序列缺口。这八个缺口中至少有六个可能是转座元件的位点;另外两个很复杂。在两个项目共有的275个基因中,60%在其预测氨基酸序列的1%范围内是相同的,31%表现出微小差异,如翻译起始或终止密码子的选择;其余9%在解读上存在重大差异。