Department of Computer Science (DI), NeuRoNe Lab, University of Salerno, via ponte don Melillo 84084, Fisciano, (SA), Italy.
BMC Bioinformatics. 2014 May 16;15:145. doi: 10.1186/1471-2105-15-145.
Inferring operon maps is crucial to understanding the regulatory networks of prokaryotic genomes. Recently, RNA-seq based transcriptome studies revealed that in many bacterial species the operon structure vary with the change of environmental conditions. Therefore, new computational solutions that use both static and dynamic data are necessary to create condition specific operon predictions.
In this work, we propose a novel classification method that integrates RNA-seq based transcriptome profiles with genomic sequence features to accurately identify the operons that are expressed under a measured condition. The classifiers are trained on a small set of confirmed operons and then used to classify the remaining gene pairs of the organism studied. Finally, by linking consecutive gene pairs classified as operons, our computational approach produces condition-dependent operon maps. We evaluated our approach on various RNA-seq expression profiles of the bacteria Haemophilus somni, Porphyromonas gingivalis, Escherichia coli and Salmonella enterica. Our results demonstrate that, using features depending on both transcriptome dynamics and genome sequence characteristics, we can identify operon pairs with high accuracy. Moreover, the combination of DNA sequence and expression data results in more accurate predictions than each one alone.
We present a computational strategy for the comprehensive analysis of condition-dependent operon maps in prokaryotes. Our method can be used to generate condition specific operon maps of many bacterial organisms for which high-resolution transcriptome data is available.
推断操纵子图谱对于理解原核基因组的调控网络至关重要。最近,基于 RNA-seq 的转录组研究表明,在许多细菌物种中,操纵子结构随着环境条件的变化而变化。因此,需要使用静态和动态数据的新计算解决方案来创建特定于条件的操纵子预测。
在这项工作中,我们提出了一种新的分类方法,该方法将基于 RNA-seq 的转录组谱与基因组序列特征相结合,以准确识别在测量条件下表达的操纵子。分类器在一小部分已确认的操纵子上进行训练,然后用于对所研究生物体的其余基因对进行分类。最后,通过链接连续分类为操纵子的基因对,我们的计算方法生成了条件依赖性的操纵子图谱。我们在细菌嗜血杆菌、牙龈卟啉单胞菌、大肠杆菌和沙门氏菌的各种 RNA-seq 表达谱上评估了我们的方法。我们的结果表明,使用既依赖转录组动态又依赖基因组序列特征的特征,我们可以以高精度识别操纵子对。此外,将 DNA 序列和表达数据相结合的结果比单独使用每一个的结果更准确。
我们提出了一种用于全面分析原核生物中条件依赖性操纵子图谱的计算策略。我们的方法可用于生成具有高分辨率转录组数据的许多细菌生物体的特定于条件的操纵子图谱。