Université Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble INP, TIMC-IMAG CNRS UMR 5525, Grenoble 38000, France.
Université Grenoble-Alpes, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Institute for Advanced Biosciences, INSERM U 1209 - CNRS UMR 5309, Grenoble 38000, France.
Mol Biol Evol. 2019 Apr 1;36(4):852-860. doi: 10.1093/molbev/msz008.
Gene-environment association (GEA) studies are essential to understand the past and ongoing adaptations of organisms to their environment, but those studies are complicated by confounding due to unobserved demographic factors. Although the confounding problem has recently received considerable attention, the proposed approaches do not scale with the high-dimensionality of genomic data. Here, we present a new estimation method for latent factor mixed models (LFMMs) implemented in an upgraded version of the corresponding computer program. We developed a least-squares estimation approach for confounder estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several order faster than existing GEA approaches and then our previous version of the LFMM program. In addition, the new method outperforms other fast approaches based on principal component or surrogate variable analysis. We illustrate the program use with analyses of the 1000 Genomes Project data set, leading to new findings on adaptation of humans to their environment, and with analyses of DNA methylation profiles providing insights on how tobacco consumption could affect DNA methylation in patients with rheumatoid arthritis. Software availability: Software is available in the R package lfmm at https://bcm-uga.github.io/lfmm/.
基因-环境关联(GEA)研究对于理解生物对环境的过去和正在进行的适应至关重要,但由于未观察到的人口因素,这些研究受到了混杂因素的困扰。尽管最近对混杂问题给予了相当多的关注,但所提出的方法并不能与基因组数据的高维性相匹配。在这里,我们提出了一种新的用于潜在因子混合模型(LFMM)的估计方法,该方法在相应计算机程序的升级版本中实现。我们开发了一种用于混杂因素估计的最小二乘估计方法,为几类基因组数据提供了一个独特的框架,而不仅仅局限于基因型。新算法的速度比现有的 GEA 方法和我们之前的 LFMM 程序快几个数量级。此外,新方法优于基于主成分或替代变量分析的其他快速方法。我们使用 1000 基因组计划数据集的分析来说明程序的使用,从而得出人类对环境适应的新发现,并使用 DNA 甲基化谱的分析来探讨吸烟如何影响类风湿关节炎患者的 DNA 甲基化。软件可用性:软件可在 R 包 lfmm 中获得,网址为 https://bcm-uga.github.io/lfmm/。