Hatton Leslie, Warr Gregory
Faculty of Science, Engineering and Computing, Kingston University, London, UK.
Medical University of South Carolina, Charleston, South Carolina, USA.
PLoS One. 2015 May 13;10(5):e0125663. doi: 10.1371/journal.pone.0125663. eCollection 2015.
That the physicochemical properties of amino acids constrain the structure, function and evolution of proteins is not in doubt. However, principles derived from information theory may also set bounds on the structure (and thus also the evolution) of proteins. Here we analyze the global properties of the full set of proteins in release 13-11 of the SwissProt database, showing by experimental test of predictions from information theory that their collective structure exhibits properties that are consistent with their being guided by a conservation principle. This principle (Conservation of Information) defines the global properties of systems composed of discrete components each of which is in turn assembled from discrete smaller pieces. In the system of proteins, each protein is a component, and each protein is assembled from amino acids. Central to this principle is the inter-relationship of the unique amino acid count and total length of a protein and its implications for both average protein length and occurrence of proteins with specific unique amino acid counts. The unique amino acid count is simply the number of distinct amino acids (including those that are post-translationally modified) that occur in a protein, and is independent of the number of times that the particular amino acid occurs in the sequence. Conservation of Information does not operate at the local level (it is independent of the physicochemical properties of the amino acids) where the influences of natural selection are manifest in the variety of protein structure and function that is well understood. Rather, this analysis implies that Conservation of Information would define the global bounds within which the whole system of proteins is constrained; thus it appears to be acting to constrain evolution at a level different from natural selection, a conclusion that appears counter-intuitive but is supported by the studies described herein.
氨基酸的物理化学性质限制蛋白质的结构、功能和进化,这一点毋庸置疑。然而,从信息论推导出来的原理也可能为蛋白质的结构(以及进化)设定界限。在此,我们分析了SwissProt数据库13 - 11版本中全套蛋白质的整体特性,通过对信息论预测的实验验证表明,它们的整体结构呈现出的特性与受守恒原理指导相一致。这一原理(信息守恒)定义了由离散成分组成的系统的整体特性,其中每个成分又依次由更小的离散片段组装而成。在蛋白质系统中,每个蛋白质是一个成分,且每个蛋白质由氨基酸组装而成。该原理的核心是蛋白质独特氨基酸数量、总长度之间的相互关系,以及这对平均蛋白质长度和具有特定独特氨基酸数量的蛋白质出现情况的影响。独特氨基酸数量简单来说就是蛋白质中出现的不同氨基酸的数量(包括那些翻译后修饰的氨基酸),且与特定氨基酸在序列中出现的次数无关。信息守恒并不在局部层面起作用(它独立于氨基酸的物理化学性质),在局部层面,自然选择的影响体现在人们熟知的各种蛋白质结构和功能中。相反,该分析表明信息守恒会定义蛋白质整个系统所受限制的全局界限;因此,它似乎在一个不同于自然选择的层面上对进化起到限制作用,这一结论看似违反直觉,但得到了本文所述研究的支持。