Audit Benjamin, Ouzounis Christos A
Wellcome Trust Genome Campus, Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge, CB10 1SD, UK.
J Mol Biol. 2003 Sep 19;332(3):617-33. doi: 10.1016/s0022-2836(03)00811-8.
The availability of complete genome sequences for a large variety of organisms is a major advance in understanding genome structure and function. One attribute of genome structure is chromosome organisation in terms of gene localisation and orientation. For example, bacterial operons, i.e. clusters of co-oriented genes that form transcription units, enable functionally related genes to be expressed simultaneously. The description of genome organisation was pioneered with the study of the distribution of genes of the Escherichia coli partial genetic map before the full genome sequence was known. Deploying powerful techniques from circular statistics and signal processing, we revisit the issue of gene localisation and orientation using 89 complete microbial chromosomes from the eubacterial and archaeal domains. We demonstrate that there is no characteristic size pertinent to the description of chromosome structure, e.g. there does not exist any single length appropriate to describe gene clustering. Our results show that, for all 89 chromosomes, gene positions and gene orientations share a common form of scale-invariant correlations known as "long-range correlations" that we can reveal for distances from the gene length, up to the chromosome size. This observation indicates that genes tend to assemble and to co-orient over any scale of observation greater than a few kilobases. This unexpected property of chromosome structure can be portrayed as an operon-like organisation at all scales and implies that a complete scale range extending over more than three orders of magnitudes of chromosome segment lengths is necessary to properly describe prokaryotic genome organisation. We propose that this pattern results from the effects of the superhelical context on gene expression coupled with the structure and dynamics of the nucleoid, possibly accommodating the diverse gene expression profiles needed during the different stages of cellular life.
获得多种生物的完整基因组序列是理解基因组结构和功能方面的一项重大进展。基因组结构的一个特性是从基因定位和方向角度来看的染色体组织。例如,细菌操纵子,即形成转录单元的同向基因簇,能使功能相关的基因同时表达。在全基因组序列知晓之前,对基因组组织的描述始于对大肠杆菌部分遗传图谱中基因分布的研究。运用来自循环统计和信号处理的强大技术,我们利用真细菌和古细菌域的89条完整微生物染色体重新审视基因定位和方向问题。我们证明不存在与染色体结构描述相关的特征尺寸,例如不存在任何单一长度适合描述基因聚类。我们的结果表明,对于所有89条染色体,基因位置和基因方向共享一种称为“长程相关性”的尺度不变相关性的常见形式,我们可以在从基因长度到染色体大小的距离范围内揭示这种相关性。这一观察结果表明,在大于几千碱基的任何观察尺度上,基因倾向于聚集并同向排列。染色体结构的这种意外特性可以在所有尺度上被描绘为类操纵子组织,这意味着需要一个跨越超过三个数量级染色体片段长度的完整尺度范围来恰当地描述原核生物基因组组织。我们提出这种模式是由超螺旋环境对基因表达的影响以及类核的结构和动力学导致的,可能适应细胞生命不同阶段所需的多样基因表达谱。