Savageau M A
Proc Natl Acad Sci U S A. 1986 Mar;83(5):1198-202. doi: 10.1073/pnas.83.5.1198.
Initial attempts to correlate the distribution of gene density (number of gene loci per unit length on the linkage map) with the distribution of lengths of coding sequences have led to the observation that 46% of approximately 1000 sampled proteins in Escherichia coli have molecular masses of n X 14,000 +/- 2500 daltons (n = 1, 2, ...). This clustering around multiples of 14,000 contrasts with the 36% one would expect in these ranges if the sizes were uniformly distributed. The entire distribution is well fit by a sum of normal or lognormal distributions located at multiples of 14,000, which suggests that the percentage of E. coli proteins governed by the underlying sizing mechanism is much greater than 50%. Clustering of protein molecular sizes around multiples of a unit size also is suggested by the distribution of well-characterized HeLa cell proteins. The distribution of gene lengths for E. coli suggests regular clustering, which implies that the clustering of protein molecular masses is not an artifact of the molecular mass measurement by gel electrophoresis. These observations suggest the existence of a fundamental structural unit. The rather uniform size of this structural unit (without any apparent sequence homology) suggests that a general principle such as geometrical or physical optimization at the DNA or protein level is responsible. This suggestion is discussed in relation to experimental evidence for the domain structure of proteins and to existing hypotheses that attempt to account for these domains. Microevolution would appear to be accommodated by incremental changes within this fundamental unit, whereas macroevolution would appear to involve "quantum" changes to the next stable size of protein.
最初尝试将基因密度(连锁图谱上每单位长度的基因座数量)分布与编码序列长度分布关联起来,结果发现大肠杆菌中约1000个抽样蛋白质里有46%的分子量为n×14,000±2500道尔顿(n = 1、2……)。这种围绕14,000倍数的聚类现象,与如果大小均匀分布时在这些范围内预期的36%形成对比。整个分布可以很好地由位于14,000倍数处的正态分布或对数正态分布之和拟合,这表明受潜在大小确定机制控制的大肠杆菌蛋白质百分比远大于50%。特征明确的海拉细胞蛋白质的分布也表明蛋白质分子大小围绕单位大小的倍数聚类。大肠杆菌基因长度的分布表明存在规则聚类,这意味着蛋白质分子量的聚类不是凝胶电泳测量分子量的人为产物。这些观察结果表明存在一个基本结构单元。这个结构单元相当均匀的大小(没有任何明显的序列同源性)表明,诸如DNA或蛋白质水平的几何或物理优化等一般原则是其原因。结合蛋白质结构域结构的实验证据以及试图解释这些结构域的现有假说,对这一观点进行了讨论。微观进化似乎可以通过这个基本单元内的增量变化来实现,而宏观进化似乎涉及到蛋白质下一个稳定大小的“量子”变化。