Rudd K E
Department of Biochemistry, University of Miami School of Medicine, Miami, FL 33101-6129, USA.
Nucleic Acids Res. 2000 Jan 1;28(1):60-4. doi: 10.1093/nar/28.1.60.
The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene12 differs from the GenBank annotation of the complete genome sequence in several ways, including (i) the revision of 706 predicted or confirmed gene start sites, (ii) the correction or hypothetical reconstruction of 61 frame-shifts caused by either sequence error or mutation, (iii) the reconstruction of 14 protein sequences interrupted by the insertion of IS elements, and (iv) pre-dictions that 92 genes are partially deleted gene fragments. A literature survey identified 717 proteins whose N-terminal amino acids have been verified by sequencing. 12 446 cross-references to 6835 literature citations and s are provided. EcoGene is accessible at a new website: http://bmb.med.miami.edu/EcoGene/EcoWeb. Users can search and retrieve individual EcoGene GenePages or they can download large datasets for incorporation into database management systems, facilitating various genome-scale computational and functional analyses.
EcoGene数据库提供了一组源自大肠杆菌K-12基因组序列的基因和蛋白质序列。EcoGene是SWISS-PROT和Colibri数据库重新注释序列的来源。EcoGene与大肠杆菌遗传种质中心合作用于遗传和物理图谱的编制。EcoGene12版本包含4293个基因。EcoGene12在几个方面与完整基因组序列的GenBank注释不同,包括:(i)706个预测或确认的基因起始位点的修订;(ii)61个由序列错误或突变导致的移码的校正或假设重建;(iii)14个被插入序列元件中断的蛋白质序列的重建;以及(iv)预测92个基因是部分缺失的基因片段。文献调查确定了717种其N端氨基酸已通过测序验证的蛋白质。提供了与6835篇文献引用的12446个交叉引用。可通过新网站http://bmb.med.miami.edu/EcoGene/EcoWeb访问EcoGene。用户可以搜索和检索单个EcoGene基因页面,或者他们可以下载大型数据集以纳入数据库管理系统,便于进行各种基因组规模的计算和功能分析。