Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
BMC Bioinformatics. 2022 Jul 27;23(1):305. doi: 10.1186/s12859-022-04835-3.
Heritability and genetic correlation can be estimated from genome-wide single-nucleotide polymorphism (SNP) data using various methods. We recently developed multivariate genomic-relatedness-based restricted maximum likelihood (MGREML) for statistically and computationally efficient estimation of SNP-based heritability ([Formula: see text]) and genetic correlation ([Formula: see text]) across many traits in large datasets. Here, we extend MGREML by allowing it to fit and perform tests on user-specified factor models, while preserving the low computational complexity.
Using simulations, we show that MGREML yields consistent estimates and valid inferences for such factor models at low computational cost (e.g., for data on 50 traits and 20,000 individuals, a saturated model involving 50 [Formula: see text]'s, 1225 [Formula: see text]'s, and 50 fixed effects is estimated and compared to a restricted model in less than one hour on a single notebook with two 2.7 GHz cores and 16 GB of RAM). Using repeated measures of height and body mass index from the US Health and Retirement Study, we illustrate the ability of MGREML to estimate a factor model and test whether it fits the data better than a nested model. The MGREML tool, the simulation code, and an extensive tutorial are freely available at https://github.com/devlaming/mgreml/ .
MGREML can now be used to estimate multivariate factor structures and perform inferences on such factor models at low computational cost. This new feature enables simple structural equation modeling using MGREML, allowing researchers to specify, estimate, and compare genetic factor models of their choosing using SNP data.
使用各种方法,可以从全基因组单核苷酸多态性 (SNP) 数据中估计遗传力和遗传相关性。我们最近开发了基于多变量基因组相关性的限制最大似然法 (MGREML),用于在大型数据集的许多性状中进行 SNP 遗传力 ([Formula: see text]) 和遗传相关性 ([Formula: see text]) 的统计和计算效率估计。在这里,我们通过允许它适应和对用户指定的因子模型进行测试,同时保持低计算复杂度,扩展了 MGREML。
通过模拟,我们表明 MGREML 以低计算成本为这种因子模型提供一致的估计和有效的推断(例如,对于涉及 50 个性状和 20,000 个人的数据,一个饱和模型涉及 50 个 [Formula: see text]'s、1225 个 [Formula: see text]'s 和 50 个固定效应,在单个笔记本电脑上使用两个 2.7 GHz 内核和 16 GB 的 RAM,在不到一个小时内对受限模型进行估计和比较)。使用来自美国健康与退休研究的身高和体重指数的重复测量值,我们说明了 MGREML 估计因子模型的能力,并检验了它是否比嵌套模型更适合数据。MGREML 工具、模拟代码和广泛的教程可在 https://github.com/devlaming/mgreml/ 上免费获得。
现在可以使用 MGREML 以低计算成本估计多变量因子结构并对这种因子模型进行推断。这个新功能允许使用 MGREML 进行简单的结构方程建模,使研究人员能够使用 SNP 数据指定、估计和比较他们选择的遗传因子模型。