Suppr超能文献

基于光谱旋转测量的基因预测:一种识别蛋白质编码区域的新方法。

Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions.

作者信息

Kotlar Daniel, Lavner Yizhar

机构信息

Department of Computer Science, Tel-Hai Academic College, Upper Galilee 12210, Israel.

出版信息

Genome Res. 2003 Aug;13(8):1930-7. doi: 10.1101/gr.1261703. Epub 2003 Jul 17.

Abstract

A new measure for gene prediction in eukaryotes is presented. The measure is based on the Discrete Fourier Transform (DFT) phase at a frequency of 1/3, computed for the four binary sequences for A, T, C, and G. Analysis of all the experimental genes of S. cerevisiae revealed distribution of the phase in a bell-like curve around a central value, in all four nucleotides, whereas the distribution of the phase in the noncoding regions was found to be close to uniform. Similar findings were obtained for other organisms. Several measures based on the phase property are proposed. The measures are computed by clockwise rotation of the vectors, obtained by DFT for each analysis frame, by an angle equal to the corresponding central value. In protein coding regions, this rotation is assumed to closely align all vectors in the complex plane, thereby amplifying the magnitude of the vector sum. In noncoding regions, this operation does not significantly change this magnitude. Computing the measures with one chromosome and applying them on sequences of others reveals improved performance compared with other algorithms that use the 1/3 frequency feature, especially in short exons. The phase property is also used to find the reading frame of the sequence.

摘要

提出了一种用于真核生物基因预测的新方法。该方法基于离散傅里叶变换(DFT)在1/3频率处的相位,该相位是针对A、T、C和G的四个二进制序列计算得出的。对酿酒酵母所有实验基因的分析表明,在所有四个核苷酸中,相位分布在围绕中心值的钟形曲线中,而非编码区的相位分布则接近均匀。其他生物体也得到了类似的结果。提出了几种基于相位特性的方法。这些方法是通过将每个分析帧通过DFT获得的向量顺时针旋转一个等于相应中心值的角度来计算的。在蛋白质编码区,这种旋转被认为会使复平面中的所有向量紧密对齐,从而放大向量和的大小。在非编码区,此操作不会显著改变该大小。用一条染色体计算这些方法并将其应用于其他序列,与使用1/3频率特征的其他算法相比,性能有所提高,尤其是在短外显子中。相位特性还用于找到序列的阅读框。

相似文献

引用本文的文献

4
Genomic signal processing for DNA sequence clustering.用于DNA序列聚类的基因组信号处理
PeerJ. 2018 Jan 24;6:e4264. doi: 10.7717/peerj.4264. eCollection 2018.
5
On DNA numerical representations for genomic similarity computation.关于用于基因组相似性计算的DNA数值表示。
PLoS One. 2017 Mar 21;12(3):e0173288. doi: 10.1371/journal.pone.0173288. eCollection 2017.
6
Wavelet analysis of frequency chaos game signal: a time-frequency signature of the DNA.频率混沌博弈信号的小波分析:DNA的时频特征
EURASIP J Bioinform Syst Biol. 2014 Sep 12;2014:16. doi: 10.1186/s13637-014-0016-z. eCollection 2014 Dec.
7
Short Exon Detection via Wavelet Transform Modulus Maxima.基于小波变换模极大值的短外显子检测
PLoS One. 2016 Sep 16;11(9):e0163088. doi: 10.1371/journal.pone.0163088. eCollection 2016.

本文引用的文献

1
The gene identification problem: an overview for developers.基因识别问题:开发者概述
Comput Chem. 1996 Mar;20(1):103-18. doi: 10.1016/s0097-8485(96)80012-x.
3
Frequency-domain analysis of biomolecular sequences.生物分子序列的频域分析。
Bioinformatics. 2000 Dec;16(12):1073-81. doi: 10.1093/bioinformatics/16.12.1073.
5
Linguistic features of noncoding DNA sequences.非编码DNA序列的语言特征。
Phys Rev Lett. 1994 Dec 5;73(23):3169-72. doi: 10.1103/PhysRevLett.73.3169.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验