Asamizu E, Sato S, Kaneko T, Nakamura Y, Kotani H, Miyajima N, Tabata S
Kazusa DNA Research Institute, Kisarazu, Chiba, Japan.
DNA Res. 1998 Dec 31;5(6):379-91. doi: 10.1093/dnares/5.6.379.
A total of 17 Pl and TAC clones each representing an assigned region of chromosome 5 were isolated from P1 and TAC genomic libraries of Arabidopsis thaliana Columbia, and their nucleotide sequences were determined. The length of the clones sequenced in this study summed up to 1,081,958 bp. As we have previously reported the sequence of 9,072,622 bp by analysis of 125 P1 and TAC clones, the total length of the sequences of chromosome 5 determined so far is now 10,154,580 bp. The sequences were subjected to similarity search against protein and EST databases and analysis with computer programs for gene modeling. As a consequence, a total of 253 potential protein-coding genes with known or predicted functions were identified. The positions of exons which do not show apparent similarity to known genes were also assigned using computer programs for exon prediction. The average density of the genes identified in this study was 1 gene per 4277 bp. Introns were observed in 74% of the potential protein genes, and the average number per gene and the average length of the introns were 4.3 and 168 bp, respectively. The sequence data and gene information are available on the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.
从拟南芥哥伦比亚生态型的P1和TAC基因组文库中分离出总共17个P1和TAC克隆,每个克隆代表第5号染色体的一个指定区域,并测定了它们的核苷酸序列。本研究中测序的克隆长度总计为1,081,958 bp。正如我们之前通过分析125个P1和TAC克隆报道了9,072,622 bp的序列,目前为止所测定的第5号染色体序列的总长度现在为10,154,580 bp。这些序列与蛋白质和EST数据库进行了相似性搜索,并使用计算机程序进行基因建模分析。结果,共鉴定出253个具有已知或预测功能的潜在蛋白质编码基因。对于与已知基因没有明显相似性的外显子位置,也使用外显子预测计算机程序进行了定位。本研究中鉴定出的基因平均密度为每4277 bp有1个基因。在74%的潜在蛋白质基因中观察到内含子,每个基因的内含子平均数量和平均长度分别为4.3个和168 bp。序列数据和基因信息可在万维网数据库KAOS(Kazusa拟南芥数据开放网站)上获取,网址为http://www.kazusa.or.jp/arabi/ 。