编码蛋白信息的可逆性和效率。

Reversibility and efficiency in coding protein information.

机构信息

Israel Institute for Advanced Research, Rehovot, Israel.

出版信息

J Theor Biol. 2010 Dec 21;267(4):519-25. doi: 10.1016/j.jtbi.2010.09.025. Epub 2010 Sep 22.

DOI:10.1016/j.jtbi.2010.09.025

PMID:20868696

Abstract

Why the genetic code has a fixed length? Protein information is transferred by coding each amino acid using codons whose length equals 3 for all amino acids. Hence the most probable and the least probable amino acid get a codeword with an equal length. Moreover, the distributions of amino acids found in nature are not uniform and therefore the efficiency of such codes is sub-optimal. The origins of these apparently non-efficient codes are yet unclear. In this paper we propose an a priori argument for the energy efficiency of such codes resulting from their reversibility, in contrast to their time inefficiency. Such codes are reversible in the sense that a primitive processor, reading three letters in each step, can always reverse its operation, undoing its process. We examine the codes for the distributions of amino acids that exist in nature and show that they could not be both time efficient and reversible. We investigate a family of Zipf-type distributions and present their efficient (non-fixed length) prefix code, their graphs, and the condition for their reversibility. We prove that for a large family of such distributions, if the code is time efficient, it could not be reversible. In other words, if pre-biotic processes demand reversibility, the protein code could not be time efficient. The benefits of reversibility are clear: reversible processes are adiabatic, namely, they dissipate a very small amount of energy. Such processes must be done slowly enough; therefore time efficiency is non-important. It is reasonable to assume that early biochemical complexes were more prone towards energy efficiency, where forward and backward processes were almost symmetrical.

摘要

为什么遗传密码具有固定的长度？蛋白质信息是通过将每个氨基酸用长度等于 3 的密码子进行编码来传递的。因此，最可能和最不可能的氨基酸得到的编码字长度相等。此外，自然界中发现的氨基酸分布并不均匀，因此这种密码的效率不是最优的。这些显然效率不高的密码的起源尚不清楚。在本文中，我们提出了一个先验的论点，即由于它们的可逆性，而不是它们的时间效率，这些代码的能量效率是最优的。这种代码是可逆的，因为一个原始处理器，在每一步读取三个字母，总是可以反转其操作，撤消其过程。我们检查了自然界中存在的氨基酸分布的代码，并表明它们不可能既具有时间效率又具有可逆性。我们研究了一类 Zipf 型分布，并提出了它们的有效（非固定长度）前缀码、它们的图以及它们可逆的条件。我们证明，对于一大类这样的分布，如果代码是时间有效的，它就不可能是可逆的。换句话说，如果前生物过程需要可逆性，那么蛋白质密码就不可能是时间有效的。可逆性的好处是显而易见的：可逆过程是绝热的，即它们只消耗非常少的能量。这样的过程必须足够慢；因此，时间效率并不重要。有理由假设，早期的生化复合物更倾向于能量效率，其中正向和反向过程几乎是对称的。