Doi H, Kitajima M, Watanabe I, Kikuchi Y, Matsuzawa F, Aikawa S, Takiguchi K, Ohno S
Biological Informatics Section, Fujitsu Labs, Ltd., Chiba, Japan.
Proc Natl Acad Sci U S A. 1995 Mar 28;92(7):2879-83. doi: 10.1073/pnas.92.7.2879.
Oligopeptidic permutations of the 20 amino acid residues give rise to proteins of diverse functions. Our long-term goal is to produce a lexicon of oligopeptides, classifying them into at least five categories: (i) ubiquitous, (ii) function specific, (iii) group specific, (iv) species specific, and (v) nonexistent. To begin with, we report on the varying frequencies of individual oligopeptides (dipeptidic to hexapeptidic in length) found among 2862 human proteins, 1942 Saccharomyces cerevisiae proteins, and 2672 Escherichia coli proteins registered in the Swiss-Prot data base (version 29.0, released in June 1994). At all lengths (dipeptides to hexapeptides), homooligopeptides were very prominent among the most frequently occurring varieties in proteins of human and bakers' yeast origins. However, this was not the case with E. coli. While all of the expected 20(3) varieties of tripeptides were found among human proteins, three tripeptides (Cys-Cys-Trp, Trp-Trp-Cys, and Trp-Trp-His) were missing from the bakers' yeast proteins. Three tripeptides (Cys-Ile-Trp, Cys-Met-Tyr, and Cys-Trp-Trp) were also absent from E. coli proteins. Inasmuch as the Swiss-Prot data base already contained 67% of the expected total of 4000 E. coli proteins, it is virtually certain that 96,000 varieties of hexapeptides containing at least one or another of the three missing tripeptides noted above shall be nonexistent in E. coli. Furthermore, the observation of missing tripeptides in the bakers' yeast proteins suggests that nonexistent hexapeptides shall be highly phylum specific. Because of the sample size, only a small fraction of the 20(6) varieties of hexapeptides were expected to be encountered in the present survey. Indeed, only 1.2-1.5% of the possible hexapeptides were found, and the average copy number of observed hexapeptides varied between 1.06 and 1.25. Nevertheless, 33 varieties of hexapeptides occurred in 102-169 copies among human proteins. Furthermore, 15 of the 33 varieties contained such rarely used residues as Tyr, His, Cys, and Trp.
20种氨基酸残基的寡肽排列产生了具有多种功能的蛋白质。我们的长期目标是生成一个寡肽词典,将它们至少分为五类:(i)普遍存在的,(ii)功能特异的,(iii)组特异的,(iv)物种特异的,以及(v)不存在的。首先,我们报告了在瑞士蛋白质数据库(1994年6月发布的第29.0版)中登记的2862种人类蛋白质、1942种酿酒酵母蛋白质和2672种大肠杆菌蛋白质中发现的各种长度(二肽至六肽)的单个寡肽的不同频率。在所有长度(二肽至六肽)上,同寡肽在人类和面包酵母来源的蛋白质中最常见的种类中非常突出。然而,大肠杆菌并非如此。虽然在人类蛋白质中发现了所有预期的20³种三肽,但面包酵母蛋白质中缺少三种三肽(半胱氨酸-半胱氨酸-色氨酸、色氨酸-色氨酸-半胱氨酸和色氨酸-色氨酸-组氨酸)。三种三肽(半胱氨酸-异亮氨酸-色氨酸、半胱氨酸-甲硫氨酸-酪氨酸和半胱氨酸-色氨酸-色氨酸)在大肠杆菌蛋白质中也不存在。由于瑞士蛋白质数据库已经包含了预期的4000种大肠杆菌蛋白质总数的67%,几乎可以肯定,在大肠杆菌中不存在包含上述三种缺失三肽中至少一种的96000种六肽变体。此外,在面包酵母蛋白质中观察到缺失的三肽表明不存在的六肽将具有高度的门类特异性。由于样本量的原因,预计在本次调查中只能遇到20⁶种六肽变体中的一小部分。实际上,只发现了1.2 - 1.5%的可能六肽,观察到的六肽的平均拷贝数在1.06和1.25之间变化。然而,有33种六肽在人类蛋白质中出现的拷贝数为102 - 169。此外,33种变体中的15种含有酪氨酸、组氨酸、半胱氨酸和色氨酸等很少使用的残基。