Jiang Rong, Yan Hong
School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia.
Int J Data Min Bioinform. 2008;2(1):15-35. doi: 10.1504/ijdmb.2008.016754.
This paper presents a new segmentation method based on spectral analysis to locate borders between short protein coding regions and non-coding regions. We formulate the innovative double curve representation of a DNA sequence and apply local three-codon measurement on the discrete Fourier spectral features at 1/3 frequency to identify short protein coding regions. The proposed spectral segmentation method based on double curves requires no prior knowledge of the DNA data. Our simulation results show that the proposed spectral method greatly improves the accuracy of identifying short coding regions in DNA sequences compared with the results obtained from the other methods that analyse DNA sequences directly.
本文提出了一种基于频谱分析的新分割方法,用于定位短蛋白质编码区域与非编码区域之间的边界。我们构建了DNA序列的创新双曲线表示,并对1/3频率处的离散傅里叶频谱特征应用局部三密码子测量,以识别短蛋白质编码区域。所提出的基于双曲线的频谱分割方法无需DNA数据的先验知识。我们的模拟结果表明,与直接分析DNA序列的其他方法相比,所提出的频谱方法大大提高了识别DNA序列中短编码区域的准确性。