Tavaré S, Song B
Bull Math Biol. 1989;51(1):95-115. doi: 10.1007/BF02458838.
The stochastic complexity of a data base of 365 protein-coding regions is analysed. When the primary sequence is modeled as a spatially homogeneous Markov source, the fit to observed codon preference is very poor. The situation improves substantially when a non-homogeneous model is used. Some implications for the estimation of species phylogeny and substitution rates are discussed.
对包含365个蛋白质编码区的数据库的随机复杂性进行了分析。当将一级序列建模为空间均匀马尔可夫源时,对观察到的密码子偏好的拟合非常差。当使用非均匀模型时,情况有了很大改善。讨论了对物种系统发育和替代率估计的一些影响。