Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, México.
Department of Biomedical Engineering, Boston University, Boston, MA, USA.
Nucleic Acids Res. 2019 Jan 8;47(D1):D212-D220. doi: 10.1093/nar/gky1077.
RegulonDB, first published 20 years ago, is a comprehensive electronic resource about regulation of transcription initiation of Escherichia coli K-12 with decades of knowledge from classic molecular biology experiments, and recently also from high-throughput genomic methodologies. We curated the literature to keep RegulonDB up to date, and initiated curation of ChIP and gSELEX experiments. We estimate that current knowledge describes between 10% and 30% of the expected total number of transcription factor- gene regulatory interactions in E. coli. RegulonDB provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. We developed a proof of concept pipeline to merge binding and expression evidence to identify regulatory interactions. These datasets can be visualized in the RegulonDB JBrowse. We developed the Microbial Conditions Ontology with a controlled vocabulary for the minimal properties to reproduce an experiment, which contributes to integrate data from high throughput and classic literature. At a higher level of integration, we report Genetic Sensory-Response Units for 200 transcription factors, including their regulation at the metabolic level, and include summaries for 70 of them. Finally, we summarize our research with Natural language processing strategies to enhance our biocuration work.
RegulonDB 于 20 年前首次发布,是一个关于大肠杆菌 K-12 转录起始调控的综合电子资源,其中包含了数十年来经典分子生物学实验的知识,最近还包含了高通量基因组方法学的知识。我们对文献进行整理以保持 RegulonDB 的更新,并开始整理 ChIP 和 gSELEX 实验。我们估计,目前的知识描述了大肠杆菌中预计的转录因子-基因调控相互作用总数的 10%到 30%之间。RegulonDB 提供了没有证据表明它们影响表达的相互作用数据集,以及表达数据集。我们开发了一个概念验证管道,将结合和表达证据合并以识别调控相互作用。这些数据集可以在 RegulonDB JBrowse 中可视化。我们开发了微生物条件本体论,使用控制词汇来描述重现实验所需的最小属性,有助于整合来自高通量和经典文献的数据。在更高的集成水平上,我们报告了 200 个转录因子的遗传感觉-反应单元,包括它们在代谢水平上的调节,并为其中 70 个转录因子提供了摘要。最后,我们总结了我们使用自然语言处理策略的研究,以增强我们的生物注释工作。