Brief Bioinform. 2019 Jul 19;20(4):1568-1577. doi: 10.1093/bib/bbx088.
The rapid accumulation of fully sequenced prokaryotic genomes provides unprecedented information for biological studies of bacterial and archaeal organisms in a systematic manner. Operons are the basic functional units for conducting such studies. Here, we review an operon database DOOR (the Database of prOkaryotic OpeRons) that we have previously developed and continue to update. Currently, the database contains 6 975 454 computationally predicted operons in 2072 complete genomes. In addition, the database also contains the following information: (i) transcriptional units for 24 genomes derived using publicly available transcriptomic data; (ii) orthologous gene mapping across genomes; (iii) 6408 cis-regulatory motifs for transcriptional factors of some operons for 203 genomes; (iv) 3 456 718 Rho-independent terminators for 2072 genomes; as well as (v) a suite of tools in support of applications of the predicted operons. In this review, we will explain how such data are computationally derived and demonstrate how they can be used to derive a wide range of higher-level information needed for systems biology studies to tackle complex and fundamental biology questions.
快速积累的全序列原核基因组为系统地研究细菌和古菌生物提供了前所未有的信息。操纵子是进行此类研究的基本功能单元。在这里,我们回顾了我们之前开发并持续更新的操纵子数据库 DOOR(原核操纵子数据库)。目前,该数据库包含 2072 个完整基因组中计算预测的 6975454 个操纵子。此外,该数据库还包含以下信息:(i) 使用公开转录组数据推导的 24 个基因组的转录单元;(ii) 跨基因组的直系同源基因映射;(iii) 203 个基因组中一些操纵子的转录因子的 6408 个顺式调控基序;(iv) 2072 个基因组的 3456718 个 Rho 非依赖性终止子;以及 (v) 一套支持预测操纵子应用的工具。在这篇综述中,我们将解释这些数据是如何通过计算得出的,并展示如何利用这些数据来获取系统生物学研究所需的广泛的高级信息,以解决复杂和基础生物学问题。