Battail Gérard
École nationale supérieure des Télécommunications de Paris, France.
Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.
Shannon's channel coding theorem (1948), a major result of information theory, paradoxically states that errorless communication is possible using an unreliable channel. Since then, engineers developed many error-correcting codes and decoding algorithms. A performance close to the predicted one was eventually achieved no earlier than the beginning of the nineties. Many communication facilities would not exist without error-correcting codes, e.g., mobile telephony and terrestrial digital television. This article explains first how they work without mathematical formalism. An error-correcting code is a minority subset among some set of messages. Within this subset, the messages are sufficiently different from each other to be exactly identified even if a number of their symbols, up to a certain limit, are changed. Beyond this limit, another message can be erroneously identified. An error-correcting code is interpreted as a set of messages subjected to constraints which make their symbols mutually dependent. Although mathematical constraints are conveniently used in engineering, constraints of any other kind, possibly of natural origin, can generate error-correcting codes. Biologists implicitly assume that genomes were conserved during the geological ages, without realizing that this is impossible without error-correcting means. Symbol errors occur during replication of a genome; chemical reactions and radiations are other sources of errors. Their number increases with time in the absence of correction. A genomic code will exactly regenerate the genome provided its decoding is attempted after a short enough time interval. If the number of errors is too large, however, the decoded genome will differ from the initial one and a mutation will occur. Periodically attempted decodings thus will conserve a genome except for very infrequent mutations if decoding attempts are frequent enough. The better conservation of very ancient parts of genomes, like the HOX genes, cannot be explained unless assuming that a genomic error-correcting code resulting from a stepwise encoding exists: a first encoding was followed later by a second one where a new information and the result of the first encoding were jointly encoded, and this process was repeated several times, eventually resulting in an overall code made of nested components where the older is an information, the better it is protected. Organic codes in Barbieri's meaning result from the same process and have the same structure. Any new organic code induces new genomic constraints, hence new components in a nested system of codes. Organic codes may thus be identified with the system of nested error-correcting codes needed to conserve the genetic information. A majority of biologists deny that information theory can be useful to them. It is shown on the contrary that the living world cannot be understood if the scientific concept of information is ignored. Heredity makes the present communicate with the past, and as a communication process is relevant to information theory, which is thus a necessary basis of biology besides physics and chemistry. The nested genomic error-correcting codes which are needed for conserving the genetic information account for the hierarchical taxonomy which structures the living world. Moreover, the main features of biological evolution, including its trend towards increasing complexity, find an explanation within this framework. Incorporating the scientific concept of information and the science based on it in the foundations of biology can widely renew the discipline but meets epistemological difficulties which must be overcome.
香农信道编码定理(1948年)是信息论的一项重大成果,自相矛盾的是,该定理表明使用不可靠信道也能实现无差错通信。从那时起,工程师们开发了许多纠错码及解码算法。直到九十年代初才最终实现了接近预测性能的结果。如果没有纠错码,许多通信设施将不复存在,例如移动电话和地面数字电视。本文首先在不使用数学形式主义的情况下解释它们的工作原理。纠错码是某组消息中的少数子集。在这个子集中,即使其一些符号在一定限度内被改变,这些消息彼此之间也有足够的差异以便能被准确识别。超过这个限度,可能会错误地识别出另一条消息。纠错码被解释为一组受到使它们的符号相互依赖的约束的消息。虽然数学约束在工程中很方便使用,但任何其他类型的约束,可能是自然产生的,都可以生成纠错码。生物学家隐含地假设基因组在地质年代中是保守的,却没有意识到如果没有纠错手段这是不可能的。符号错误在基因组复制过程中发生;化学反应和辐射是其他错误来源。在没有校正的情况下,它们的数量会随时间增加。如果在足够短的时间间隔后尝试解码,基因组编码将精确地再生基因组。然而,如果错误数量太大,解码后的基因组将与初始基因组不同,就会发生突变。因此,如果解码尝试足够频繁,定期尝试解码将除了非常罕见的突变外保存基因组。除非假设存在由逐步编码产生的基因组纠错码,否则无法解释基因组中非常古老部分(如HOX基因)更好的保守性:首先进行一次编码,随后再进行一次编码,其中新信息与第一次编码的结果被联合编码,这个过程重复几次,最终产生由嵌套组件组成的整体编码,其中越古老的是信息,它受到的保护就越好。巴比里意义上的有机编码源于相同的过程并具有相同的结构。任何新的有机编码都会引发新的基因组约束,从而在嵌套的编码系统中产生新的组件。因此,有机编码可以与保存遗传信息所需的嵌套纠错码系统等同。大多数生物学家否认信息论对他们有用。相反,事实表明,如果忽视信息的科学概念,就无法理解生物世界。遗传使现在与过去相互关联,而作为一个通信过程与信息论相关,因此信息论除了物理和化学之外也是生物学的必要基础。保存遗传信息所需的嵌套基因组纠错码解释了构建生物世界的层次分类法。此外,生物进化的主要特征,包括其向复杂性增加的趋势,在这个框架内也能找到解释。将信息的科学概念及其相关科学纳入生物学基础可以广泛更新这门学科,但会遇到必须克服的认识论困难。