Kotlar Daniel, Lavner Yizhar
Department of Computer Science, Tel-Hai Academic College, Upper Galilee 12210, Israel.
Genome Res. 2003 Aug;13(8):1930-7. doi: 10.1101/gr.1261703. Epub 2003 Jul 17.
A new measure for gene prediction in eukaryotes is presented. The measure is based on the Discrete Fourier Transform (DFT) phase at a frequency of 1/3, computed for the four binary sequences for A, T, C, and G. Analysis of all the experimental genes of S. cerevisiae revealed distribution of the phase in a bell-like curve around a central value, in all four nucleotides, whereas the distribution of the phase in the noncoding regions was found to be close to uniform. Similar findings were obtained for other organisms. Several measures based on the phase property are proposed. The measures are computed by clockwise rotation of the vectors, obtained by DFT for each analysis frame, by an angle equal to the corresponding central value. In protein coding regions, this rotation is assumed to closely align all vectors in the complex plane, thereby amplifying the magnitude of the vector sum. In noncoding regions, this operation does not significantly change this magnitude. Computing the measures with one chromosome and applying them on sequences of others reveals improved performance compared with other algorithms that use the 1/3 frequency feature, especially in short exons. The phase property is also used to find the reading frame of the sequence.
提出了一种用于真核生物基因预测的新方法。该方法基于离散傅里叶变换(DFT)在1/3频率处的相位,该相位是针对A、T、C和G的四个二进制序列计算得出的。对酿酒酵母所有实验基因的分析表明,在所有四个核苷酸中,相位分布在围绕中心值的钟形曲线中,而非编码区的相位分布则接近均匀。其他生物体也得到了类似的结果。提出了几种基于相位特性的方法。这些方法是通过将每个分析帧通过DFT获得的向量顺时针旋转一个等于相应中心值的角度来计算的。在蛋白质编码区,这种旋转被认为会使复平面中的所有向量紧密对齐,从而放大向量和的大小。在非编码区,此操作不会显著改变该大小。用一条染色体计算这些方法并将其应用于其他序列,与使用1/3频率特征的其他算法相比,性能有所提高,尤其是在短外显子中。相位特性还用于找到序列的阅读框。