Dimayacyac Jose Rafael, Wu Shanyun, Jiang Daohan, Pennell Matt
Department of Zoology, University of British Columbia, Canada.
Michael Smith Laboratories, University of British Columbia, Canada.
bioRxiv. 2023 Aug 17:2023.02.09.527893. doi: 10.1101/2023.02.09.527893.
Phylogenetic comparative methods are increasingly used to test hypotheses about the evolutionary processes that drive divergence in gene expression among species. However, it is unknown whether the distributional assumptions of phylogenetic models designed for quantitative phenotypic traits are realistic for expression data and importantly, the reliability of conclusions of phylogenetic comparative studies of gene expression may depend on whether the data is well-described by the chosen model. To evaluate this, we first fit several phylogenetic models of trait evolution to 8 previously published comparative expression datasets, comprising a total of 54,774 genes with 145,927 unique gene-tissue combinations. Using a previously developed approach, we then assessed how well the best model of the set described the data in an absolute (not just relative) sense. First, we find that Ornstein-Uhlenbeck models, in which expression values are constrained around an optimum, were the preferred model for 66% of gene-tissue combinations. Second, we find that for 61% of gene-tissue combinations, the best fit model of the set was found to perform well; the rest were found to be performing poorly by at least one of the test statistics we examined. Third, we find that when simple models do not perform well, this appears to be typically a consequence of failing to fully account for heterogeneity in the rate of the evolution. We advocate that assessment of model performance should become a routine component of phylogenetic comparative expression studies; doing so can improve the reliability of inferences and inspire the development of novel models.
系统发育比较方法越来越多地用于检验有关驱动物种间基因表达差异的进化过程的假设。然而,为定量表型性状设计的系统发育模型的分布假设对于表达数据是否现实尚不清楚,重要的是,基因表达的系统发育比较研究结论的可靠性可能取决于所选模型对数据的描述程度。为了评估这一点,我们首先将几个性状进化的系统发育模型应用于8个先前发表的比较表达数据集,这些数据集总共包含54774个基因和145927个独特的基因-组织组合。然后,我们使用先前开发的方法,从绝对(而非仅仅相对)意义上评估该组中最佳模型对数据的描述程度。首先,我们发现表达值受最优值约束的奥恩斯坦-乌伦贝克模型是66%的基因-组织组合的首选模型。其次,我们发现对于61%的基因-组织组合,该组中最佳拟合模型表现良好;其余的至少在我们检验的一个统计量上表现不佳。第三,我们发现当简单模型表现不佳时,这通常似乎是未能充分考虑进化速率异质性的结果。我们主张,模型性能评估应成为系统发育比较表达研究的常规组成部分;这样做可以提高推断的可靠性,并激发新模型的开发。