Gatto Gregory J, Berg Jeremy M
Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.
Genome Res. 2003 Apr;13(4):617-23. doi: 10.1101/gr.667603.
The availability of complete genome sequences enables the statistical analysis of sequence features without significant database-imposed bias. The carboxyl termini of proteins often contain regions associated with protein targeting and enhanced translational termination. We analyzed the frequency of occurrence of C-terminal tripeptides in representative archaeal, bacterial, and eukaryotic genomes. The sequence distribution in prokaryotic genomes nearly matches that generated by the randomization of the observed tripeptide set. In contrast, eukaryotic genomes contain large numbers of overrepresented sequences. Some of these correspond to highly repeated sequences from either duplicated endogenous genes or transposon open reading frames. Gratifyingly, others represent previously known targeting signals or sequences associated with an increase in translational termination efficiency. However, a number of overrepresented tripeptides have not been previously noted and may represent novel functional sequences. For example, the sequence XSS may enhance translational termination efficiency in plants, whereas FWC may be a targeting or processing signal for certain amino acid permeases in yeast.
完整基因组序列的可得性使得能够对序列特征进行统计分析,而不会受到显著的数据库强加的偏差影响。蛋白质的羧基末端通常包含与蛋白质靶向和增强翻译终止相关的区域。我们分析了代表性古菌、细菌和真核生物基因组中C末端三肽的出现频率。原核生物基因组中的序列分布几乎与通过观察到的三肽集随机化产生的分布相匹配。相比之下,真核生物基因组包含大量过度富集的序列。其中一些对应于来自重复的内源基因或转座子开放阅读框的高度重复序列。令人欣慰的是,其他一些代表了先前已知的靶向信号或与翻译终止效率增加相关的序列。然而,有许多过度富集的三肽以前未被注意到,可能代表新的功能序列。例如,序列XSS可能会提高植物中的翻译终止效率,而FWC可能是酵母中某些氨基酸通透酶的靶向或加工信号。