Salgado Heladia, Santos-Zavaleta Alberto, Gama-Castro Socorro, Peralta-Gil Martín, Peñaloza-Spínola Mónica I, Martínez-Antonio Agustino, Karp Peter D, Collado-Vides Julio
Bioinformatics Research Group, SRI International, 333 Ravenswood Ave EK207, Menlo Park CA 94025 USA.
BMC Bioinformatics. 2006 Jan 6;7:5. doi: 10.1186/1471-2105-7-5.
Escherichia coli is the model organism for which our knowledge of its regulatory network is the most extensive. Over the last few years, our project has been collecting and curating the literature concerning E. coli transcription initiation and operons, providing in both the RegulonDB and EcoCyc databases the largest electronically encoded network available. A paper published recently by Ma et al. (2004) showed several differences in the versions of the network present in these two databases. Discrepancies have been corrected, annotations from this and other groups (Shen-Orr et al., 2002) have been added, making the RegulonDB and EcoCyc databases the largest comprehensive and constantly curated regulatory network of E. coli K-12.
Several groups have been using these curated data as part of their bioinformatics and systems biology projects, in combination with external data obtained from other sources, thus enlarging the dataset initially obtained from either RegulonDB or EcoCyc of the E. coli K12 regulatory network. We kindly obtained from the groups of Uri Alon and Hong-Wu Ma the interactions they have added to enrich their public versions of the E. coli regulatory network. These were used to search for original references and curate them with the same standards we use regularly, adding in several cases the original references (instead of reviews or missing references), as well as adding the corresponding experimental evidence codes. We also corrected all discrepancies in the two databases available as explained below.
One hundred and fifty new interactions have been added to our databases as a result of this specific curation effort, in addition to those added as a result of our continuous curation work. RegulonDB gene names are now based on those of EcoCyc to avoid confusion due to gene names and synonyms, and the public releases of RegulonDB and EcoCyc are henceforth synchronized to avoid confusion due to different versions. Public flat files are available providing direct access to the regulatory network interactions thus avoiding errors due to differences in database modelling and representation. The regulatory network available in RegulonDB and EcoCyc is the most comprehensive and regularly updated electronically-encoded regulatory network of E. coli K-12.
大肠杆菌是我们对其调控网络了解最为广泛的模式生物。在过去几年里,我们的项目一直在收集和整理有关大肠杆菌转录起始和操纵子的文献,在RegulonDB和EcoCyc数据库中提供了现有的最大规模的电子编码网络。马等人(2004年)最近发表的一篇论文显示了这两个数据库中网络版本的一些差异。已对差异进行了纠正,添加了来自该团队及其他团队(申 - 奥尔等人,2002年)的注释,使RegulonDB和EcoCyc数据库成为大肠杆菌K - 12最大的全面且持续整理的调控网络。
几个团队一直在将这些整理后的数据作为其生物信息学和系统生物学项目的一部分,与从其他来源获得的外部数据相结合,从而扩大了最初从RegulonDB或EcoCyc获取的大肠杆菌K12调控网络数据集。我们从乌里·阿隆和马宏武的团队处善意获取了他们为丰富其公开版本的大肠杆菌调控网络而添加的相互作用。这些相互作用被用于查找原始参考文献,并按照我们通常使用的相同标准进行整理,在几种情况下添加了原始参考文献(而非综述或缺失的参考文献),以及添加了相应的实验证据代码。我们还按照如下所述纠正了两个现有数据库中的所有差异。
由于此次特定的整理工作,除了我们持续整理工作所添加的相互作用外,又有150个新的相互作用被添加到我们的数据库中。RegulonDB基因名称现在基于EcoCyc的基因名称,以避免因基因名称和同义词造成混淆,并且RegulonDB和EcoCyc的公开版本从今往后保持同步,以避免因版本不同而产生混淆。提供了公共平面文件,可直接访问调控网络相互作用,从而避免因数据库建模和表示差异而导致的错误。RegulonDB和EcoCyc中可用的调控网络是大肠杆菌K - 12最全面且定期更新的电子编码调控网络。