Kaneko T, Kotani H, Nakamura Y, Sato S, Asamizu E, Miyajima N, Tabata S
Kazusa DNA Research Institute, Chiba, Japan.
DNA Res. 1998 Apr 30;5(2):131-45. doi: 10.1093/dnares/5.2.131.
The nucleotide sequences of 21 P1 and TAC clones which have been precisely localized to the fine physical map of the Arabidopsis thaliana chromosome 5, were determined, and their sequence features were analyzed. The total length of the regions sequenced in this study were 1,381,565 bp, bringing the total length of the chromosome 5 sequences determined so far to 6,691,670 bp together with the regions of the 69 clones previously reported. By computer-aided analyses including similarity search against protein and EST databases and gene modeling with computer programs, a total of 337 potential protein-coding genes and/or gene segments were identified on the basis of similarity to the reported gene sequences. An average density of the genes and/or gene segments thus assigned was 1 gene/4,100 bp. Introns were identified in 76.7% of the potential protein genes for which the entire gene structure were predicted, and the average number per gene and the average length of the introns were 3.9 and 176 bp, respectively. These sequence features are essentially identical to those in the previously reported sequences. The numbers of the Arabidopsis ESTs matched to each of the predicted genes have been counted to monitor the transcription level. The sequence data and gene information are available on the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http:@www.kazusa.or.jp@arabi
测定了21个已精确定位于拟南芥第5号染色体精细物理图谱上的P1和TAC克隆的核苷酸序列,并分析了它们的序列特征。本研究中测序区域的总长度为1,381,565 bp,加上之前报道的69个克隆的区域,使目前已确定的第5号染色体序列的总长度达到6,691,670 bp。通过计算机辅助分析,包括对蛋白质和EST数据库的相似性搜索以及使用计算机程序进行基因建模,基于与已报道基因序列的相似性,共鉴定出337个潜在的蛋白质编码基因和/或基因片段。如此确定的基因和/或基因片段的平均密度为1个基因/4,100 bp。在预测了完整基因结构的潜在蛋白质基因中,76.7%鉴定出了内含子,每个基因的平均内含子数量和平均长度分别为3.9个和176 bp。这些序列特征与之前报道的序列基本相同。已统计了与每个预测基因匹配的拟南芥EST数量,以监测转录水平。序列数据和基因信息可在万维网数据库KAOS(Kazusa拟南芥数据开放站点)上获取,网址为http:@www.kazusa.or.jp@arabi