Centre National de la Recherche Scientifique, Grenoble INP, TIMC-IMAG CNRS UMR 5525, Université Grenoble-Alpes, Grenoble, France.
Mol Ecol Resour. 2021 Nov;21(8):2738-2748. doi: 10.1111/1755-0998.13366. Epub 2021 Mar 29.
A major objective of evolutionary biology is to understand the processes by which organisms have adapted to various environments, and to predict the response of organisms to new or future conditions. The availability of large genomic and environmental data sets provides an opportunity to address those questions, and the R package LEA has been introduced to facilitate population and ecological genomic analyses in this context. By using latent factor models, the program computes ancestry coefficients from population genetic data and performs genotype-environment association analyses with correction for unobserved confounding variables. In this study, we present new functionalities of LEA, which include imputation of missing genotypes, fast algorithms for latent factor mixed models using multivariate predictors for genotype-environment association studies, population differentiation tests for admixed or continuous populations, and estimation of genetic offset based on climate models. The new functionalities are implemented in version 3.1 and higher releases of the package. Using simulated and real data sets, our study provides evaluations and examples of applications, outlining important practical considerations when analysing ecological genomic data in R.
进化生物学的主要目标之一是了解生物适应各种环境的过程,并预测生物对新的或未来条件的反应。大型基因组和环境数据集的可用性为解决这些问题提供了机会,并且已经引入了 R 包 LEA 来促进这方面的种群和生态基因组分析。该程序通过使用潜在因子模型,从群体遗传数据中计算出祖先系数,并进行基因型-环境关联分析,同时校正未观察到的混杂变量。在这项研究中,我们介绍了 LEA 的新功能,包括缺失基因型的插补、使用多元预测因子进行基因型-环境关联研究的潜在因子混合模型的快速算法、用于混合或连续种群的种群分化检验,以及基于气候模型的遗传偏移估计。新版本 3.1 及更高版本的包中实现了这些新功能。使用模拟和真实数据集,我们的研究提供了评估和应用示例,概述了在 R 中分析生态基因组数据时的重要实际考虑因素。