Gonzalez Diego Luis, Giannerini Simone, Rosa Rodolfo
CNR-Fondazione scuola di S. Giorgio, I-30124, Venezia, Italy.
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Nov;78(5 Pt 1):051918. doi: 10.1103/PhysRevE.78.051918. Epub 2008 Nov 19.
The study of correlation structures in DNA sequences is of great interest because it allows us to obtain structural and functional information about underlying genetic mechanisms. In this paper we present a study of the correlation structure of protein coding sequences of DNA based on a recently developed mathematical representation of the genetic code. A fundamental consequence of such representation is that codons can be assigned a parity class (odd-even). Such parity can be obtained by means of a nonlinear algorithm acting on the chemical character of the codon bases. In the same setting the Rumer's class can be naturally described and a new dichotomic class, the hidden class, can be defined. Moreover, we show that the set of DNA's base transformations associated to the three dichotomic classes can be put in a compact group-theoretic framework. We use the dichotomic classes as a coding scheme for DNA sequences and study the mutual dependence between such classes. The same analysis is carried out also on the chemical dichotomies of DNA bases. In both cases, the statistical analysis is performed by using an entropy-based dependence metric possessing many desirable properties. We obtain meaningful tests for mutual dependence by using suitable resampling techniques. We find strong short-range correlations between certain combinations of dichotomic codon classes. These results support our previous hypothesis that codon classes might play an active role in the organization of genetic information.
对DNA序列中的相关结构进行研究具有极大的意义,因为它能让我们获取有关潜在遗传机制的结构和功能信息。在本文中,我们基于最近开发的遗传密码数学表示法,对DNA的蛋白质编码序列的相关结构进行了研究。这种表示法的一个基本结果是,可以为密码子分配一个奇偶类(奇数 - 偶数)。这种奇偶性可以通过对密码子碱基的化学特性起作用的非线性算法来获得。在相同的背景下,可以自然地描述鲁默类,并定义一个新的二分法类,即隐藏类。此外,我们表明与这三个二分法类相关的DNA碱基变换集可以置于一个紧凑的群论框架中。我们将二分法类用作DNA序列的编码方案,并研究这些类之间的相互依赖性。对DNA碱基的化学二分法也进行了同样的分析。在这两种情况下,统计分析都是通过使用具有许多理想特性的基于熵的依赖性度量来进行的。我们通过使用合适的重采样技术获得了相互依赖性的有意义测试。我们发现二分法密码子类的某些组合之间存在很强的短程相关性。这些结果支持了我们之前的假设,即密码子类可能在遗传信息的组织中发挥积极作用。