Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, USA.
Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
BMC Genomics. 2020 May 20;21(1):370. doi: 10.1186/s12864-020-6761-3.
Researchers often measure changes in gene expression across conditions to better understand the shared functional roles and regulatory mechanisms of different genes. Analogous to this is comparing gene expression across species, which can improve our understanding of the evolutionary processes shaping the evolution of both individual genes and functional pathways. One area of interest is determining genes showing signals of coevolution, which can also indicate potential functional similarity, analogous to co-expression analysis often performed across conditions for a single species. However, as with any trait, comparing gene expression across species can be confounded by the non-independence of species due to shared ancestry, making standard hypothesis testing inappropriate.
We compared RNA-Seq data across 18 fungal species using a multivariate Brownian Motion phylogenetic comparative method (PCM), which allowed us to quantify coevolution between protein pairs while directly accounting for the shared ancestry of the species. Our work indicates proteins which physically-interact show stronger signals of coevolution than randomly-generated pairs. Interactions with stronger empirical and computational evidence also showing stronger signals of coevolution. We examined the effects of number of protein interactions and gene expression levels on coevolution, finding both factors are overall poor predictors of the strength of coevolution between a protein pair. Simulations further demonstrate the potential issues of analyzing gene expression coevolution without accounting for shared ancestry in a standard hypothesis testing framework. Furthermore, our simulations indicate the use of a randomly-generated null distribution as a means of determining statistical significance for detecting coevolving genes with phylogenetically-uncorrected correlations, as has previously been done, is less accurate than PCMs, although is a significant improvement over standard hypothesis testing. These methods are further improved by using a phylogenetically-corrected correlation metric.
Our work highlights potential benefits of using PCMs to detect gene expression coevolution from high-throughput omics scale data. This framework can be built upon to investigate other evolutionary hypotheses, such as changes in transcription regulatory mechanisms across species.
研究人员经常测量不同条件下的基因表达变化,以更好地了解不同基因的共享功能作用和调控机制。类似地,比较不同物种的基因表达可以帮助我们理解塑造个体基因和功能途径进化的进化过程。一个感兴趣的领域是确定表现出共进化信号的基因,这些信号也可以表明潜在的功能相似性,类似于为单个物种在不同条件下进行的共表达分析。然而,与任何特征一样,由于物种之间的共同祖先,比较不同物种的基因表达可能会受到物种之间非独立性的影响,从而使标准假设检验变得不合适。
我们使用多元布朗运动系统发育比较方法(PCM)比较了 18 种真菌物种的 RNA-Seq 数据,该方法允许我们量化蛋白质对之间的共进化,同时直接考虑物种的共同祖先。我们的工作表明,物理相互作用的蛋白质比随机生成的蛋白质对表现出更强的共进化信号。具有更强经验和计算证据的相互作用也表现出更强的共进化信号。我们研究了蛋白质相互作用数量和基因表达水平对共进化的影响,发现这两个因素总体上都不能很好地预测蛋白质对之间共进化的强度。模拟进一步表明,在标准假设检验框架中不考虑共同祖先分析基因表达共进化可能存在的问题。此外,我们的模拟表明,使用随机生成的零分布作为确定使用未校正相关性检测共进化基因的统计显著性的一种方法,不如 PCM 准确,尽管比标准假设检验有了显著的改进。通过使用校正相关度度量,这些方法得到了进一步的改进。
我们的工作强调了使用 PCM 从高通量组学数据中检测基因表达共进化的潜在优势。可以在此框架的基础上进一步研究其他进化假说,例如转录调控机制在不同物种之间的变化。