Department of Computer Science, Wellesley College, Wellesley, MA 02481, USA.
Methods. 2020 Apr 1;176:62-70. doi: 10.1016/j.ymeth.2019.03.026. Epub 2019 Apr 4.
An operon is a set of neighboring genes in a genome that is transcribed as a single polycistronic message. Genes that are part of the same operon often have related functional roles or participate in the same metabolic pathways. The majority of all bacterial genes are co-transcribed with one or more other genes as part of a multi-gene operon. Thus, accurate identification of operons is important in understanding co-regulation of genes and their functional relationships. Here, we present a computational system that uses RNA-seq data to determine operons throughout a genome. The system takes the name of a genome and one or more files of RNA-seq data as input. Our method combines primary genomic sequence information with expression data from the RNA-seq files in a unified probabilistic model in order to identify operons. We assess our method's ability to accurately identify operons in a range of species through comparison to external databases of operons, both experimentally confirmed and computationally predicted, and through focused experiments that confirm new operons identified by our method. Our system is freely available at https://cs.wellesley.edu/~btjaden/Rockhopper/.
操纵子是基因组中一组相邻的基因,它们作为一个单一的多顺反子信息转录。属于同一操纵子的基因通常具有相关的功能作用或参与相同的代谢途径。大多数细菌基因都与一个或多个其他基因一起作为多基因操纵子的一部分共同转录。因此,准确识别操纵子对于理解基因的协同调控及其功能关系非常重要。在这里,我们提出了一个使用 RNA-seq 数据来确定整个基因组中操纵子的计算系统。该系统以基因组的名称和一个或多个 RNA-seq 数据文件作为输入。我们的方法将主要的基因组序列信息与 RNA-seq 文件中的表达数据结合在一个统一的概率模型中,以识别操纵子。我们通过与实验证实和计算预测的操纵子外部数据库进行比较,以及通过确认我们的方法识别的新操纵子的重点实验,评估了我们的方法在一系列物种中准确识别操纵子的能力。我们的系统可在 https://cs.wellesley.edu/~btjaden/Rockhopper/ 免费获得。