de Souza S J, Long M, Schoenbach L, Roy S W, Gilbert W
Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
Proc Natl Acad Sci U S A. 1996 Dec 10;93(25):14632-6. doi: 10.1073/pnas.93.25.14632.
We analyze the three-dimensional structure of proteins by a computer program that finds regions of sequence that contain module boundaries, defining a module as a segment of polypeptide chain bounded in space by a specific given distance. The program defines a set of "linker regions" that have the property that if an intron were to be placed into each linker region, the protein would be dissected into a set of modules all less than the specified diameter. We test a set of 32 proteins, all of ancient origin, and a corresponding set of 570 intron positions, to ask if there is a statistically significant excess of intron positions within the linker regions. For 28-A modules, a standard size used historically, we find such an excess, with P < 0.003. This correlation is neither due to a compositional or sequence bias in the linker regions nor to a surface bias in intron positions. Furthermore, a subset of 20 introns, which can be putatively identified as old, lies even more explicitly within the linker regions, with P < 0.0003. Thus, there is a strong correlation between intron positions and three-dimensional structural elements of ancient proteins as expected by the introns-early approach. We then study a range of module diameters and show that, as the diameter varies, significant peaks of correlation appear for module diameters centered at 21.7, 27.6, and 32.9 A. These preferred module diameters roughly correspond to predicted exon sizes of 15, 22, and 30 residues. Thus, there are significant correlations between introns, modules, and a quantized pattern of the lengths of polypeptide chains, which is the prediction of the "Exon Theory of Genes."
我们通过一个计算机程序来分析蛋白质的三维结构,该程序能找到包含模块边界的序列区域,将模块定义为多肽链中在空间上由特定给定距离界定的一段。该程序定义了一组“连接区域”,其特性是如果在每个连接区域插入一个内含子,蛋白质将被分解成一组直径均小于指定值的模块。我们测试了一组32个均起源古老的蛋白质以及相应的570个内含子位置,以探究连接区域内的内含子位置是否在统计上显著过量。对于历史上使用的标准尺寸28 - A模块,我们发现了这种过量,P < 0.003。这种相关性既不是由于连接区域的组成或序列偏差,也不是由于内含子位置的表面偏差。此外,一组可被推测为古老的20个内含子的子集,更明确地位于连接区域内,P < 0.0003。因此,正如内含子早期理论所预期的,内含子位置与古老蛋白质的三维结构元件之间存在很强的相关性。然后我们研究了一系列模块直径,并表明随着直径变化,对于以21.7、27.6和32.9 Å为中心的模块直径会出现显著的相关性峰值。这些优选的模块直径大致对应于预测的15、22和30个残基的外显子大小。因此,内含子、模块与多肽链长度的量化模式之间存在显著相关性,这正是“基因外显子理论”的预测。