Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA.
Evol Bioinform Online. 2013 Jul 28;9:301-16. doi: 10.4137/EBO.S11600. Print 2013.
Variation in substitution rates across a phylogeny can be indicative of shifts in the evolutionary dynamics of a protein or non-protein coding regions. One way to understand these signals is to seek the phenotypic correlates of rate variation. Here, we extended a previously published likelihood method designed to detect evolutionary associations between genotypic evolutionary rate and phenotype over a phylogeny. In simulation with two discrete categories of phenotype, the method has a low false-positive rate and detects greater than 80% of true-positives with a tree length of three or greater and a three-fold or greater change in substitution rate given the phenotype. In addition, we successfully extend the test from two to four phenotype categories and evaluated its performance. We then applied the method to two major hypotheses for rate variation in the mitochondrial genome of primates-longevity and generation time as well as body mass which is correlated with many aspects of life history-using three categories of phenotype through discretization of continuous values. Similar to previous results for mammals, we find that the majority of mitochondrial protein-coding genes show associations consistent with the longevity and body mass predictions and that the predominant signal of association comes from the third codon position. We also found a significant association between maximum lifespan and the evolutionary rate of the control region of the mtDNA. In contrast, 24 protein-coding genes from the nuclear genome do not show a consistent pattern of association, which is inconsistent with the generation time hypothesis. These results show the extended method can robustly identify genotype-phenotype associations up to at least four phenotypic categories, and demonstrate the successful application of the method to study factors affecting neutral evolutionary rate in protein-coding and non-coding loci.
在系统发育中替代率的变化可以表明蛋白质或非编码区域进化动态的转变。理解这些信号的一种方法是寻找速率变化的表型相关性。在这里,我们扩展了以前发表的一种似然方法,该方法旨在检测基因型进化率与表型之间在系统发育上的进化关联。在具有两种离散表型类别的模拟中,该方法的假阳性率较低,并且在树长为三或更长,替代率变化为三倍或更大的情况下,检测到超过 80%的真阳性。此外,我们成功地将测试从两种扩展到四种表型类别,并评估了其性能。然后,我们通过连续值的离散化,使用三种表型类别将该方法应用于灵长类动物线粒体基因组中速率变化的两个主要假说-寿命和世代时间以及与许多生命史方面相关的体质量。与哺乳动物的先前结果相似,我们发现大多数线粒体蛋白编码基因显示出与寿命和体质量预测一致的关联,并且关联的主要信号来自第三个密码子位置。我们还发现最大寿命与 mtDNA 控制区的进化率之间存在显著关联。相比之下,核基因组中的 24 个蛋白编码基因没有表现出一致的关联模式,这与世代时间假说不一致。这些结果表明,扩展方法可以可靠地识别至少四个表型类别的基因型-表型关联,并证明该方法成功地应用于研究影响蛋白质编码和非编码基因中性进化率的因素。