Audit Benjamin, Vaillant Cédric, Arnéodo Alain, d'Aubenton-Carafa Yves, Thermes Claude
Centre de Recherche Paul Pascal, avenue Schweitzer, 33600 Pessac, France.
J Biol Phys. 2004 Mar;30(1):33-81. doi: 10.1023/B:JOBP.0000016438.86794.8e.
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime (≲ 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].
以往的基因组DNA序列分析表明,碱基对在大尺度上具有与尺度不变统计特性相关的关联性。我们在本研究中表明,这些核苷酸(字母)之间的相关性实际上源于参与染色质中DNA包装的序列依赖性DNA结构元件(单词)之间的长程相关性(LRC)。使用小波变换技术,我们对DNA文本以及基于核小体定位数据用曲率表生成的相应弯曲轮廓进行了比较分析。通过所谓的“小波变换显微镜”进行的这一探索揭示了一个100 - 200 bp的特征尺度,它将两种不同LRC的区域分隔开来。我们在此关注小尺度区域(≲ 200 bp)中LRC的存在情况。对三个生物界基因组的分析表明,这一区域特别与核小体的存在相关。实际上,在真核生物基因组中观察到了小尺度LRC,在古细菌基因组中程度稍低,而在真细菌基因组中则不存在。同样,在真核生物的病毒DNA基因组中观察到了这一区域,而在细菌病毒DNA基因组中则未观察到。痘病毒基因组是唯一的例外,它是唯一不在细胞核中复制且不存在小尺度LRC的动物DNA病毒。此外,在所有检测的RNA病毒基因组中均未检测到小尺度LRC,逆转录病毒是唯一的例外。总之,这些结果强烈表明小尺度LRC是核小体结构的一个特征。最后,我们讨论了这些小尺度LRC在控制核小体沿DNA链的定位、稳定性和动力学机制方面的可能解释。本文主要致力于对理论概念和物理方法进行教学式介绍,这些理论概念和物理方法非常适合对基因组序列进行统计分析。我们回顾了在研究三个生物界中各种生物的DNA序列时通过所谓的基于小波的多重分形分析所获得的结果。其中一些结果已在B. Audit等人的文献[1, 2]中公布。