Nicholas School of the Environment, Duke University, Durham, North Carolina.
Department of Biology, Pennsylvania State University, University Park, Pennsylvania.
Mol Ecol. 2018 May;27(9):2215-2233. doi: 10.1111/mec.14584. Epub 2018 Apr 23.
Identifying adaptive loci can provide insight into the mechanisms underlying local adaptation. Genotype-environment association (GEA) methods, which identify these loci based on correlations between genetic and environmental data, are particularly promising. Univariate methods have dominated GEA, despite the high dimensional nature of genotype and environment. Multivariate methods, which analyse many loci simultaneously, may be better suited to these data as they consider how sets of markers covary in response to environment. These methods may also be more effective at detecting adaptive processes that result in weak, multilocus signatures. Here, we evaluate four multivariate methods and five univariate and differentiation-based approaches, using published simulations of multilocus selection. We found that Random Forest performed poorly for GEA. Univariate GEAs performed better, but had low detection rates for loci under weak selection. Constrained ordinations, particularly redundancy analysis (RDA), showed a superior combination of low false-positive and high true-positive rates across all levels of selection. These results were robust across the demographic histories, sampling designs, sample sizes and weak population structure tested here. The value of combining detections from different methods was variable and depended on the study goals and knowledge of the drivers of selection. Re-analysis of genomic data from grey wolves highlighted the unique, covarying sets of adaptive loci that could be identified using RDA. Although additional testing is needed, this study indicates that RDA is an effective means of detecting adaptation, including signatures of weak, multilocus selection, providing a powerful tool for investigating the genetic basis of local adaptation.
识别适应性基因座可以深入了解局部适应的机制。基于遗传和环境数据之间相关性的基因型-环境关联 (GEA) 方法在识别这些基因座方面特别有前途。尽管基因型和环境具有高度的维数性质,但单变量方法仍然主导着 GEA。多变量方法同时分析许多基因座,可能更适合这些数据,因为它们考虑了标记如何响应环境而共同变化。这些方法也可能更有效地检测导致弱多基因座特征的适应过程。在这里,我们使用已发表的多基因座选择模拟来评估四种多变量方法和五种单变量和基于分化的方法。我们发现随机森林在 GEA 方面表现不佳。单变量 GEA 表现更好,但在弱选择下的基因座检测率较低。约束排序,特别是冗余分析 (RDA),在所有选择水平下均显示出低假阳性和高真阳性率的优越组合。这些结果在我们测试的人口历史、采样设计、样本量和弱群体结构方面都是稳健的。不同方法的检测结果的结合价值是可变的,取决于研究目标和选择驱动因素的知识。对灰狼基因组数据的重新分析突出了使用 RDA 可以识别的独特、共同变化的适应性基因座集。尽管需要进一步的测试,但这项研究表明,RDA 是一种有效的检测适应的方法,包括弱多基因座选择的特征,为研究局部适应的遗传基础提供了强大的工具。