Pinho Armando J, Neves António J R, Afreixo Vera, Bastos Carlos A C, Ferreira Paulo J S G
Signal Processing Laboratory, DETI/IEETA, University of Aveiro, 3810-193 Aveiro, Portugal.
IEEE Trans Biomed Eng. 2006 Nov;53(11):2148-55. doi: 10.1109/TBME.2006.879477.
It is known that the protein-coding regions of DNA are usually characterized by a three-base periodicity. In this paper, we exploit this property, studying a DNA model based on three deterministic states, where each state implements a finite-context model. The experimental results obtained confirm the appropriateness of the proposed approach, showing compression gains in relation to the single finite-context model counterpart. Additionally, and potentially more interesting than the compression gain on its own, is the observation that the entropy associated to each of the three base positions of a codon differs and that this variation is not the same among the organisms analyzed.
众所周知,DNA的蛋白质编码区域通常具有三碱基周期性特征。在本文中,我们利用这一特性,研究了一种基于三种确定性状态的DNA模型,其中每种状态都实现了一个有限上下文模型。所获得的实验结果证实了所提出方法的适用性,显示出相对于单一有限上下文模型对应物的压缩增益。此外,可能比压缩增益本身更有趣的是,观察到密码子的三个碱基位置各自相关的熵不同,并且这种变化在所分析的生物体之间并不相同。