Asamizu E, Nakamura Y, Sato S, Tabata S
Kazusa DNA Research Institute, Kisarazu, Chiba, Japan.
DNA Res. 2000 Jun 30;7(3):175-80. doi: 10.1093/dnares/7.3.175.
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.
为了全面分析模式双子叶植物拟南芥中表达的基因,积累了表达序列标签(EST)。分别从地上器官、花芽、根、绿色角果和液体培养的幼苗构建了标准化和大小选择的cDNA文库,共获得了14026个5'端EST和39207个3'端EST。3'端EST可聚类为12028个非冗余组。将非冗余EST与公共非冗余蛋白质数据库进行相似性搜索,结果表明,4816个组与已知功能的基因相似,1864个组与假定基因相似,其余5348个为新序列。利用3号和5号染色体上约10 Mb的注释基因组序列分析了非冗余EST的基因覆盖情况。共有923个区域至少被一个EST命中,其中只有499个区域被公共数据库中存放的EST命中。结果表明,本项目产生的EST来源补充了公共数据库中的EST数据,有助于新基因的发现。