Holmquist R, Cimino J B
Biosystems. 1980;12(1-2):1-22. doi: 10.1016/0303-2647(80)90034-9.
The method of maximum entropy inference developed by Jaynes can be a particularly useful method for obtaining unbiased estimates of biological parameters when the experimental knowledge about a system can be explicitly formulated. Base transition probabilities between genes, though central to evolutionary theory and understanding, present a difficult estimation problem because the ancestral genes are not experimentally accessible. The necessary estimates must therefore be made on the basis of experimental knowledge other than a direct frequency count of base replacements (A leads to C, for example) between contemporary genes. It is shown how maximum entropy inference together with the experimentally observed fact of compositional fidelity in a given gene family can be used to obtain meaningful gene base transition probabilities at each of the three nucleotide positions within codons. Both symmetric and asymmetric transition probabilities are considered. Tables of these probabilities are given for each codon position for the alpha-hemoglobin, beta-hemoglobin, myoglobin, cytochrome c, and the parvalbumin group genes. Tabular values of the average amino acid composition of these five protein families and the average nucleotide composition of their coding genes at varied codon loci are given. It is thus no longer necessary to assume in theories of evolutionary divergence equimolar base ratios A:C:G:U::1:1:1:1 or that each base has an equal chance of mutating to and being fixed as any one of the other three bases.
当关于一个系统的实验知识能够被明确表述时,由杰恩斯提出的最大熵推理方法对于获取生物参数的无偏估计可能是一种特别有用的方法。基因之间的碱基转换概率虽然是进化理论和理解的核心,但却是一个困难的估计问题,因为祖先基因无法通过实验获取。因此,必须基于除当代基因之间碱基替换(例如A转换为C)的直接频率计数之外的实验知识来进行必要的估计。本文展示了如何将最大熵推理与给定基因家族中组成保真度的实验观察事实相结合,以获得密码子内三个核苷酸位置处有意义的基因碱基转换概率。同时考虑了对称和不对称转换概率。给出了α - 血红蛋白、β - 血红蛋白、肌红蛋白、细胞色素c和小白蛋白组基因每个密码子位置的这些概率表。还给出了这五个蛋白质家族的平均氨基酸组成及其编码基因在不同密码子位点的平均核苷酸组成的表格值。因此,在进化分歧理论中不再需要假设等摩尔碱基比率A:C:G:U::1:1:1:1,也不再需要假设每个碱基突变为其他三个碱基之一并固定为该碱基的机会均等。