Yin Changchuan, Yau Stephen S-T
Department of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, M/C 249, Chicago, IL 60607-7045, USA.
J Theor Biol. 2007 Aug 21;247(4):687-94. doi: 10.1016/j.jtbi.2007.03.038. Epub 2007 Apr 10.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.
随着基因组序列呈指数级增长,从基因组序列中准确识别蛋白质编码区域(外显子)的需求日益增加。尽管在过去二十年中通过计算方法在蛋白质编码区域识别方面取得了许多进展,但预测方法的性能和效率仍有待提高。此外,开发不同的预测方法是必不可少的,因为结合不同方法可能会大大提高预测准确性。本文基于大多数外显子序列具有3碱基周期性而内含子序列不具有这一独特特征的事实,开发了一种预测蛋白质编码区域的新方法。该方法利用DNA序列三个密码子位置的核苷酸分布,计算目标DNA序列逐步DNA片段的3碱基周期性和背景噪声。可以从DNA序列中3碱基周期性与背景噪声的比率趋势来识别外显子和内含子序列。对来自不同生物体的基因进行的案例研究表明,该方法是一种有效的外显子预测方法。