6-5-607 Miyanodai, Sakura, Chiba, 285-0857, Japan.
BMC Evol Biol. 2013 Nov 21;13:257. doi: 10.1186/1471-2148-13-257.
Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified?
Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate.
The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.
核苷酸和氨基酸取代倾向是每个物种、细胞器和蛋白质家族的特征。因此,需要为系统发育分析估计各种经验性氨基酸取代率矩阵:JTT、WAG 和 LG 用于核蛋白,mtREV 用于线粒体蛋白,cpREV10 和 cpREV64 用于叶绿体编码蛋白,FLU 用于流感蛋白。另一方面,在一种机制性密码子取代模型中,每个密码子取代率与密码子突变率和取决于氨基酸取代类型的固定比率的乘积成正比,可以针对每个蛋白质家族添加额外的 11 个参数来调整突变率和氨基酸的选择约束强度。因此,在密码子序列的进化分析中,它优于与经验性氨基酸取代矩阵等效的密码子取代模型。即使对于氨基酸序列,它是否具有优势,其中同义替换不能被识别?
假设核苷酸突变独立于密码子位置发生,但允许在微小时间内发生多个核苷酸变化。对每种类型的氨基酸替换的选择约束都针对每个基因进行调整,线性函数是给定选择约束的估计值,该值通过最大化经验性氨基酸或密码子取代频率矩阵的似然度来估计,每个矩阵都是 JTT、WAG、LG 和 KHG。结果表明,在三种线粒体、叶绿体和流感 A 血凝素蛋白的系统发育树中,假设密码子使用相等的机制性密码子取代模型比分别专门为这些蛋白质家族设计的 mtREV、cpREV64 和 FLU 等经验性氨基酸取代模型具有更好的 Akaike 和贝叶斯信息准则值。跨站点的选择约束变化比可变密码子突变率更能显著拟合数据集,这证实了氨基酸取代模型检测到的取代率变化主要是由氨基酸取代的选择约束变化引起的,而不是由密码子突变率的变化引起的。
即使在蛋白质序列的进化分析中,机制性密码子取代模型也优于氨基酸取代模型。