Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
Nucleic Acids Res. 2011 Jan;39(1):e3. doi: 10.1093/nar/gkq891. Epub 2010 Oct 14.
Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners' shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).
在计算生物学中,识别 DNA 序列中的蛋白质编码区是一个活跃的问题。在本研究中,我们提出了一种自适谱旋转(SASR)方法,该方法基于对三联体周期性特性的研究,无需任何预先的训练过程,即可可视化 DNA 序列中的编码区。当其他优秀方法没有预测编码区所需的额外信息时,我们建议使用这种方法来帮助进行粗略的编码区预测。在这种方法中,在 DNA 序列的每个位置,从后一个子序列中计算傅里叶谱。在谱之后,在复平面上生成随机游动作为 SASR 的图形输出。SASR 在真实 DNA 数据上的应用表明,图形输出中的模式揭示了编码区的位置和它们之间的帧移:弧表示编码区,稳定点表示非编码区,角的形状揭示了帧移。对酿酒酵母基因组数据集的测试表明,编码区和非编码区的图形模式有很大的不同,因此可以直观地区分编码区。同时,时间成本测试表明,SASR 可以很容易地实现,其计算复杂度为 O(N)。