Edwards Martin T, Rison Stuart C G, Stoker Neil G, Wernisch Lorenz
School of Crystallography, Birkbeck College London WC1E 7HX, UK.
Nucleic Acids Res. 2005 Jun 7;33(10):3253-62. doi: 10.1093/nar/gki634. Print 2005.
An important step in understanding the regulation of a prokaryotic genome is the generation of its transcription unit map. The current strongest operon predictor depends on the distributions of intergenic distances (IGD) separating adjacent genes within and between operons. Unfortunately, experimental data on these distance distributions are limited to Escherichia coli and Bacillus subtilis. We suggest a new graph algorithmic approach based on comparative genomics to identify clusters of conserved genes independent of IGD and conservation of gene order. As a consequence, distance distributions of operon pairs for any arbitrary prokaryotic genome can be inferred. For E.coli, the algorithm predicts 854 conserved adjacent pairs with a precision of 85%. The IGD distribution for these pairs is virtually identical to the E.coli operon pair distribution. Statistical analysis of the predicted pair IGD distribution allows estimation of a genome-specific operon IGD cut-off, obviating the requirement for a training set in IGD-based operon prediction. We apply the method to a representative set of eight genomes, and show that these genome-specific IGD distributions differ considerably from each other and from the distribution in E.coli.
理解原核生物基因组调控的一个重要步骤是生成其转录单元图谱。当前最强的操纵子预测器依赖于操纵子内部和之间相邻基因的基因间距离(IGD)分布。不幸的是,关于这些距离分布的实验数据仅限于大肠杆菌和枯草芽孢杆菌。我们提出了一种基于比较基因组学的新图算法方法,以识别独立于IGD和基因顺序保守性的保守基因簇。因此,可以推断出任何任意原核生物基因组的操纵子对的距离分布。对于大肠杆菌,该算法预测了854个保守相邻对,精度为85%。这些对的IGD分布与大肠杆菌操纵子对分布几乎相同。对预测的对IGD分布进行统计分析,可以估计特定于基因组的操纵子IGD截止值,从而无需在基于IGD的操纵子预测中使用训练集。我们将该方法应用于一组具有代表性的八个基因组,并表明这些特定于基因组的IGD分布彼此之间以及与大肠杆菌中的分布有很大差异。