Anastassiou D
Department of Electrical Engineering, Columbia University, 500 West 120th Street, Mail Code 4712, New York, NY 10027, USA.
Bioinformatics. 2000 Dec;16(12):1073-81. doi: 10.1093/bioinformatics/16.12.1073.
Frequency-domain analysis of biomolecular sequences is hindered by their representation as strings of characters. If numerical values are assigned to each of these characters, then the resulting numerical sequences are readily amenable to digital signal processing.
We introduce new computational and visual tools for biomolecular sequences analysis. In particular, we provide an optimization procedure improving upon traditional Fourier analysis performance in distinguishing coding from noncoding regions in DNA sequences. We also show that the phase of a properly defined Fourier transform is a powerful predictor of the reading frame of protein coding regions. Resulting color maps help in visually identifying not only the existence of protein coding areas for both DNA strands, but also the coding direction and the reading frame for each of the exons. Furthermore, we demonstrate that color spectrograms can visually provide, in the form of local 'texture', significant information about biomolecular sequences, thus facilitating understanding of local nature, structure and function.
生物分子序列的频域分析因将其表示为字符串而受到阻碍。如果为这些字符中的每一个赋予数值,那么所得的数字序列就很容易进行数字信号处理。
我们引入了用于生物分子序列分析的新计算和可视化工具。特别是,我们提供了一种优化程序,在区分DNA序列中的编码区和非编码区方面比传统傅里叶分析性能有所提高。我们还表明,适当定义的傅里叶变换的相位是蛋白质编码区阅读框的有力预测指标。所得的彩色图谱不仅有助于直观地识别两条DNA链上蛋白质编码区域的存在,还能识别每个外显子的编码方向和阅读框。此外,我们证明彩色频谱图可以以局部“纹理”的形式直观地提供有关生物分子序列的重要信息,从而有助于理解局部性质、结构和功能。