Department of Biology, Pennsylvania State University, University Park, PA, USA.
Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab291.
In evolutionary genomics, it is fundamentally important to understand how characteristics of genomic sequences, such as gene expression level, determine the rate of adaptive evolution. While numerous statistical methods, such as the McDonald-Kreitman (MK) test, are available to examine the association between genomic features and the rate of adaptation, we currently lack a statistical approach to disentangle the independent effect of a genomic feature from the effects of other correlated genomic features. To address this problem, I present a novel statistical model, the MK regression, which augments the MK test with a generalized linear model. Analogous to the classical multiple regression model, the MK regression can analyze multiple genomic features simultaneously to infer the independent effect of a genomic feature, holding constant all other genomic features. Using the MK regression, I identify numerous genomic features driving positive selection in chimpanzees. These features include well-known ones, such as local mutation rate, residue exposure level, tissue specificity, and immune genes, as well as new features not previously reported, such as gene expression level and metabolic genes. In particular, I show that highly expressed genes may have a higher adaptation rate than their weakly expressed counterparts, even though a higher expression level may impose stronger negative selection. Also, I show that metabolic genes may have a higher adaptation rate than their nonmetabolic counterparts, possibly due to recent changes in diet in primate evolution. Overall, the MK regression is a powerful approach to elucidate the genomic basis of adaptation.
在进化基因组学中,理解基因组序列特征(如基因表达水平)如何决定适应进化的速度是至关重要的。虽然有许多统计方法,如 McDonald-Kreitman(MK)检验,可用于研究基因组特征与适应速度之间的关联,但我们目前缺乏一种统计方法来区分基因组特征的独立效应与其他相关基因组特征的效应。为了解决这个问题,我提出了一种新的统计模型——MK 回归,它用广义线性模型增强了 MK 检验。类似于经典的多元回归模型,MK 回归可以同时分析多个基因组特征,以推断在固定所有其他基因组特征的情况下,一个基因组特征的独立效应。使用 MK 回归,我鉴定出了许多在黑猩猩中驱动正选择的基因组特征。这些特征包括众所周知的特征,如局部突变率、残基暴露水平、组织特异性和免疫基因,以及以前未报道过的新特征,如基因表达水平和代谢基因。特别是,我表明高表达基因的适应速度可能高于低表达基因,尽管高表达水平可能会受到更强的负选择。此外,我表明代谢基因的适应速度可能高于非代谢基因,这可能是由于灵长类动物进化过程中饮食的近期变化。总体而言,MK 回归是阐明适应的基因组基础的一种强大方法。