Center for Computational Biology, University of California, Berkeley, CA 94720, USA.
Departments of Integrative Biology and Statistics, University of California, Berkeley, CA 94720, USA.
Mol Biol Evol. 2024 Aug 2;41(8). doi: 10.1093/molbev/msae156.
We here present CLUES2, a full-likelihood method to infer natural selection from sequence data that is an extension of the method CLUES. We make several substantial improvements to the CLUES method that greatly increases both its applicability and its speed. We add the ability to use ancestral recombination graphs on ancient data as emissions to the underlying hidden Markov model, which enables CLUES2 to use both temporal and linkage information to make estimates of selection coefficients. We also fully implement the ability to estimate distinct selection coefficients in different epochs, which allows for the analysis of changes in selective pressures through time, as well as selection with dominance. In addition, we greatly increase the computational efficiency of CLUES2 over CLUES using several approximations to the forward-backward algorithms and develop a new way to reconstruct historic allele frequencies by integrating over the uncertainty in the estimation of the selection coefficients. We illustrate the accuracy of CLUES2 through extensive simulations and validate the importance sampling framework for integrating over the uncertainty in the inference of gene trees. We also show that CLUES2 is well-calibrated by showing that under the null hypothesis, the distribution of log-likelihood ratios follows a χ2 distribution with the appropriate degrees of freedom. We run CLUES2 on a set of recently published ancient human data from Western Eurasia and test for evidence of changing selection coefficients through time. We find significant evidence of changing selective pressures in several genes correlated with the introduction of agriculture to Europe and the ensuing dietary and demographic shifts of that time. In particular, our analysis supports previous hypotheses of strong selection on lactase persistence during periods of ancient famines and attenuated selection in more modern periods.
我们在这里提出了 CLUES2,这是一种从序列数据推断自然选择的全似然方法,是 CLUES 方法的扩展。我们对 CLUES 方法进行了几项实质性的改进,极大地提高了它的适用性和速度。我们增加了使用古代数据的祖先重组图作为底层隐马尔可夫模型的发射的能力,这使得 CLUES2 能够利用时间和连锁信息来估计选择系数。我们还完全实现了在不同时期估计不同选择系数的能力,这允许分析随着时间的推移选择性压力的变化,以及具有显性的选择。此外,我们通过对前向-后向算法的几个近似方法极大地提高了 CLUES2 相对于 CLUES 的计算效率,并开发了一种通过在选择系数估计的不确定性上进行积分来重建历史等位基因频率的新方法。我们通过广泛的模拟来证明 CLUES2 的准确性,并验证了用于在基因树推断的不确定性上进行积分的重要抽样框架的重要性。我们还通过显示在零假设下,对数似然比的分布遵循具有适当自由度的 χ2 分布,表明 CLUES2 得到了很好的校准。我们在一组最近发表的来自西欧的古代人类数据上运行 CLUES2,并测试随着时间的推移选择系数是否发生变化的证据。我们在与农业引入欧洲以及当时随之而来的饮食和人口结构变化相关的几个基因中发现了选择压力发生变化的显著证据。特别是,我们的分析支持了先前关于乳糖酶持续存在在古代饥荒时期受到强烈选择以及在更现代时期选择减弱的假设。