假定的程序性翻译移码位点的计算识别

Computational identification of putative programmed translational frameshift sites.

作者信息

Shah Atul A, Giddings Michael C, Parvaz Jasmin B, Gesteland Raymond F, Atkins John F, Ivanov Ivaylo P

机构信息

Department of Human Genetics, University of Utah, SLC, 84112-5330, USA.

出版信息

Bioinformatics. 2002 Aug;18(8):1046-53. doi: 10.1093/bioinformatics/18.8.1046.

DOI:10.1093/bioinformatics/18.8.1046

PMID:12176827

Abstract

MOTIVATION

In an effort to identify potential programmed frameshift sites by statistical analysis, we explore the hypothesis that selective pressure would have rendered such sites underabundant and underrepresented in protein-coding sequences. We developed a computer program to compare the frequencies of k-length subsequences of nucleotides with the frequencies predicted by a zero order Markov chain determined by the codon bias of the same set of sequences. The program was used to calculate and evaluate the distribution of 7-base oligonucleotides in the 6000+ putative protein-coding sequences of S. cerevisiae preliminary to the laboratory testing of the most highly underrepresented oligos for frameshifting efficiency.

RESULTS

Among the most significant results is the finding that the heptanucleotides CUU-AGG-C and CUU-AGU-U, sites of the programmed +1 translational frameshifts required for the production in yeast of actin filament-binding protein ABP140 and telomerase subunit EST3, respectively, rank among the least represented of phase I heptanucleotides in the coding sequences of S. cerevisiae. Laboratory experiments demonstrated that other underrepresented heptanucleotides identified by the program, for example GGU-CAG-A, are also prone to significant translational frameshifting, suggesting the possibility that genes containing other underrepresented heptamers may also encode transframe products.

AVAILABILITY

The program is available for download from http://www.gesteland.genetics.utah.edu/freqAnalysis

SUPPLEMENTARY INFORMATION

Complete results from the analysis of S. cerevisiae are available on http://www.gesteland.genetics.utah.edu/freqAnalysis

摘要

动机

为了通过统计分析确定潜在的程序性移码位点，我们探讨了这样一种假设，即选择压力会使这些位点在蛋白质编码序列中数量稀少且代表性不足。我们开发了一个计算机程序，用于比较核苷酸k长度子序列的频率与由同一组序列的密码子偏好决定的零阶马尔可夫链预测的频率。在对代表性最低的寡核苷酸进行移码效率的实验室测试之前，该程序被用于计算和评估酿酒酵母6000多个假定蛋白质编码序列中7碱基寡核苷酸的分布。

结果

最显著的结果之一是发现七核苷酸CUU-AGG-C和CUU-AGU-U，分别是酵母中肌动蛋白丝结合蛋白ABP140和端粒酶亚基EST3产生所需的程序性+1翻译移码位点，在酿酒酵母编码序列中I期七核苷酸中代表性最低。实验室实验表明，该程序识别出的其他代表性不足的七核苷酸，例如GGU-CAG-A，也容易发生显著的翻译移码，这表明含有其他代表性不足的七聚体的基因也可能编码移码产物。