Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kgs. Lyngby, Denmark.
Nucleic Acids Res. 2023 Oct 27;51(19):10176-10193. doi: 10.1093/nar/gkad750.
Transcriptomic data is accumulating rapidly; thus, scalable methods for extracting knowledge from this data are critical. Here, we assembled a top-down expression and regulation knowledge base for Escherichia coli. The expression component is a 1035-sample, high-quality RNA-seq compendium consisting of data generated in our lab using a single experimental protocol. The compendium contains diverse growth conditions, including: 9 media; 39 supplements, including antibiotics; 42 heterologous proteins; and 76 gene knockouts. Using this resource, we elucidated global expression patterns. We used machine learning to extract 201 modules that account for 86% of known regulatory interactions, creating the regulatory component. With these modules, we identified two novel regulons and quantified systems-level regulatory responses. We also integrated 1675 curated, publicly-available transcriptomes into the resource. We demonstrated workflows for analyzing new data against this knowledge base via deconstruction of regulation during aerobic transition. This resource illuminates the E. coli transcriptome at scale and provides a blueprint for top-down transcriptomic analysis of non-model organisms.
转录组数据正在迅速积累;因此,从这些数据中提取知识的可扩展方法至关重要。在这里,我们为大肠杆菌组装了一个自上而下的表达和调控知识库。表达组件是一个由 1035 个样本组成的高质量 RNA-seq 汇编,其中包含了我们使用单一实验方案在实验室中生成的数据。该汇编包含了多种生长条件,包括:9 种培养基;39 种添加物,包括抗生素;42 种异源蛋白;和 76 个基因敲除。利用这一资源,我们阐明了全局表达模式。我们使用机器学习提取了 201 个模块,这些模块解释了 86%已知的调控相互作用,从而构成了调控组件。通过这些模块,我们鉴定出了两个新的调控群,并量化了系统水平的调控反应。我们还将 1675 个已验证的、可公开获得的转录组整合到资源中。我们通过有氧转化过程中调控的解构,展示了针对该知识库分析新数据的工作流程。该资源大规模阐明了大肠杆菌转录组,并为非模式生物的自上而下转录组分析提供了蓝图。