通过改变密码子长度提高遗传密码效率——完美的遗传密码。

Improving the efficiency of the genetic code by varying the codon length--the perfect genetic code.

作者信息

Doig A J

机构信息

Department of Biomolecular Sciences, UMIST, Manchester, M60 1QD, U.K.

出版信息

J Theor Biol. 1997 Oct 7;188(3):355-60. doi: 10.1006/jtbi.1997.0489.

DOI:10.1006/jtbi.1997.0489

PMID:9344740

Abstract

The function of DNA is to specify protein sequences. The four-base "alphabet" used in nucleic acids is translated to the 20 base alphabet of proteins (plus a stop signal) via the genetic code. The code is neither overlapping nor punctuated, but has mRNA sequences read in successive triplet codons until reaching a stop codon. The true genetic code uses three bases for every amino acid. The efficiency of the genetic code can be significantly increased if the requirement for a fixed codon length is dropped so that the more common amino acids have shorter codon lengths and rare amino acids have longer codon lengths. More efficient codes can be derived using the Shannon-Fano and Huffman coding algorithms. The compression achieved using a Huffman code cannot be improved upon. I have used these algorithms to derive efficient codes for representing protein sequences using both two and four bases. The length of DNA required to specify the complete set of protein sequences could be significantly shorter if transcription used a variable codon length. The restriction to a fixed codon length of three bases means that it takes 42% more DNA than the minimum necessary, and the genetic code is 70% efficient. One can think of many reasons why this maximally efficient code has not evolved: there is very little redundancy so almost any mutation causes an amino acid change. Many mutations will be potentially lethal frame-shift mutations, if the mutation leads to a change in codon length. It would be more difficult for the machinery of transcription to cope with a variable codon length. Nevertheless, in the strict and narrow sense of coding for protein sequences using the minimum length of DNA possible, the Huffman code derived here is perfect.

摘要

DNA的功能是指定蛋白质序列。核酸中使用的四碱基“字母表”通过遗传密码被翻译成蛋白质的20碱基字母表（加上一个终止信号）。该密码既不重叠也无标点，而是以连续的三联体密码子读取mRNA序列，直到到达终止密码子。真正的遗传密码每个氨基酸使用三个碱基。如果放弃对固定密码子长度的要求，使较常见的氨基酸具有较短的密码子长度，而罕见氨基酸具有较长的密码子长度，遗传密码的效率可以显著提高。使用香农 - 法诺编码算法和哈夫曼编码算法可以得到更高效的编码。使用哈夫曼编码实现的压缩效果无法再改进。我已使用这些算法得出了用两个碱基和四个碱基表示蛋白质序列的高效编码。如果转录使用可变密码子长度，指定完整蛋白质序列集所需的DNA长度可能会显著缩短。对固定为三个碱基的密码子长度的限制意味着所需的DNA比最低必要量多42%，并且遗传密码的效率为70%。人们可以想出许多原因来解释为什么这种最高效的密码没有进化：几乎没有冗余，所以几乎任何突变都会导致氨基酸变化。如果突变导致密码子长度改变，许多突变将是潜在致命的移码突变。转录机制要应对可变密码子长度会更加困难。然而，从使用尽可能短的DNA长度对蛋白质序列进行编码的严格和狭义意义上讲，这里得出的哈夫曼编码是完美的。

相似文献

Improving the efficiency of the genetic code by varying the codon length--the perfect genetic code.通过改变密码子长度提高遗传密码效率——完美的遗传密码。

J Theor Biol. 1997 Oct 7;188(3):355-60. doi: 10.1006/jtbi.1997.0489.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

An algebraic hypothesis about the primeval genetic code architecture.关于原始遗传密码结构的代数假设。

Math Biosci. 2009 Sep;221(1):60-76. doi: 10.1016/j.mbs.2009.07.001. Epub 2009 Jul 14.

An extended RNA code and its relationship to the standard genetic code: an algebraic and geometrical approach.一种扩展的RNA密码及其与标准遗传密码的关系：代数与几何方法

Bull Math Biol. 2007 Jan;69(1):215-43. doi: 10.1007/s11538-006-9119-3. Epub 2006 Nov 2.

A complementary circular code in the protein coding genes.蛋白质编码基因中的一种互补循环码。

J Theor Biol. 1996 Sep 7;182(1):45-58. doi: 10.1006/jtbi.1996.0142.

The fourfold way of the genetic code.遗传密码的四重简并方式。

Biosystems. 2009 Nov;98(2):105-14. doi: 10.1016/j.biosystems.2009.07.006. Epub 2009 Jul 28.

The genetic code--more than just a table.遗传密码——远不止是一张表格。

Cell Biochem Biophys. 2009;55(2):107-16. doi: 10.1007/s12013-009-9060-9. Epub 2009 Jul 29.

The use of logistic models for the analysis of codon frequencies of DNA sequences in terms of explanatory variables.使用逻辑模型根据解释变量分析DNA序列的密码子频率。

Biometrics. 1994 Dec;50(4):1054-63.

Study of the genetic code adaptability by means of a genetic algorithm.遗传算法对遗传密码适应性的研究。

J Theor Biol. 2010 Jun 7;264(3):854-65. doi: 10.1016/j.jtbi.2010.02.041. Epub 2010 Feb 26.

Reversibility and efficiency in coding protein information.编码蛋白信息的可逆性和效率。

J Theor Biol. 2010 Dec 21;267(4):519-25. doi: 10.1016/j.jtbi.2010.09.025. Epub 2010 Sep 22.

引用本文的文献

Codon and Reverse Codon: A Theoretical Approach to Reinterpret the Genetic Code Table.密码子与反密码子：重新诠释遗传密码表的理论方法。

Cureus. 2023 Nov 10;15(11):e48598. doi: 10.7759/cureus.48598. eCollection 2023 Nov.

Information theoretic perspective on genome clustering.基因组聚类的信息论视角

Saudi J Biol Sci. 2021 Mar;28(3):1867-1889. doi: 10.1016/j.sjbs.2020.12.039. Epub 2020 Dec 31.

Trends to store digital data in DNA: an overview.将数字数据存储于DNA中的趋势：综述

Mol Biol Rep. 2018 Oct;45(5):1479-1490. doi: 10.1007/s11033-018-4280-y. Epub 2018 Aug 2.

New Trends of Digital Data Storage in DNA.DNA 中数字数据存储的新趋势

Biomed Res Int. 2016;2016:8072463. doi: 10.1155/2016/8072463. Epub 2016 Sep 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过改变密码子长度提高遗传密码效率——完美的遗传密码。

Improving the efficiency of the genetic code by varying the codon length--the perfect genetic code.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献