Ponce de Leon Miguel, de Miranda Antonio Basilio, Alvarez-Valin Fernando, Carels Nicolas
Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Iguá, Montevideo, Uruguay.
Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil.
Bioinform Biol Insights. 2014 May 20;8:93-108. doi: 10.4137/BBI.S13161. eCollection 2014.
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
在本报告中,我们分析了与三个核苷酸密码子位置的统计数据相关的蛋白质二级结构。本研究的目的是找出核糖体、tRNA或蛋白质水平的哪些特性可以解释在编码DNA中观察到的嘌呤偏好(Rrr)。我们发现,Rrr模式是由蛋白质的物理化学限制和核糖体机制的热力学限制所导致的规律性(密码子结构)的结果。对蛋白质的物理化学限制主要来自二级结构的亲水性和分子量(MW)以及氨基酸合成的能量成本。这些限制通过统计相关性网络显现出来,例如:(i)氨基酸合成成本,这有利于第一个密码子位置有更高水平的鸟嘌呤;(ii)蛋白质中亲水性交替的建设性贡献;(iii)根据溶剂可及性的蛋白质二级结构的空间组织;(iv)根据氨基酸亲水性的二级结构的空间组织;(v)MW与蛋白质二级结构及其整体亲水性的统计相关性;(vi)第二个密码子位置的胸腺嘧啶与亲水性和氨基酸合成能量成本的统计相关性;以及(vii)第二个密码子位置的腺嘌呤与氨基酸复杂性和蛋白质二级结构MW的统计相关性。氨基酸的物理化学性质和对蛋白质的功能限制构成了一种密码,该密码通过tRNA在编码DNA内转化为嘌呤偏好。从这个意义上说,编码DNA内的Rrr模式是通过根据密码子位置进行选择,信息从蛋白质到DNA对核苷酸组成进行转移的结果。因此,考虑到对蛋白质的功能限制,编码DNA结构和核糖体机制共同进化以最小化蛋白质编码的能量成本。