Department of Biology, University of York, York, YO10 5DD, UK.
J Mol Evol. 2021 Dec;89(9-10):601-610. doi: 10.1007/s00239-021-10028-y. Epub 2021 Aug 26.
Which variables determine the constraints on gene sequence evolution is one of the most central questions in molecular evolution. In the fission yeast Schizosaccharomyces pombe, an important model organism, the variables influencing the rate of sequence evolution have yet to be determined. Previous studies in other single celled organisms have generally found gene expression levels to be most significant, with numerous other variables such as gene length and functional importance identified as having a smaller impact. Using publicly available data, we used partial least squares regression, principal components regression, and partial correlations to determine the variables most strongly associated with sequence evolution constraints. We identify centrality in the protein-protein interactions network, amino acid composition, and cellular location as the most important determinants of sequence conservation. However, each factor only explains a small amount of variance, and there are numerous variables having a significant or heterogeneous influence. Our models explain more than half of the variance in dN, raising the possibility that future refined models could quantify the role of stochastics in evolutionary rate variation.
哪些变量决定了基因序列进化的限制是分子进化中最核心的问题之一。在裂殖酵母 Schizosaccharomyces pombe 中,一个重要的模式生物,影响序列进化速度的变量尚未确定。在其他单细胞生物中的先前研究通常发现基因表达水平是最重要的,许多其他变量,如基因长度和功能重要性,被认为具有较小的影响。我们使用公开可用的数据,使用偏最小二乘回归、主成分回归和偏相关来确定与序列进化限制最密切相关的变量。我们确定蛋白质-蛋白质相互作用网络、氨基酸组成和细胞位置的中心性是序列保守性的最重要决定因素。然而,每个因素仅解释了一小部分方差,并且有许多变量具有显著或异质的影响。我们的模型解释了超过一半的 dN 方差,这增加了未来更精细的模型可以量化随机性在进化率变化中的作用的可能性。