Wang Haohan, Aragam Bryon, Xing Eric P
School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
Booth School of Business, University of Chicago, Chicago, Illinois, USA.
J Comput Biol. 2022 Mar;29(3):233-242. doi: 10.1089/cmb.2021.0157. Epub 2022 Feb 25.
Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors-population stratification and environmental confounding factors-and study how different methods that are commonly used in practice trade off these two confounding factors differently.
受全基因组关联研究(GWAS)文献中众所周知的实证论据的推动,我们研究了应用于GWAS的线性混合模型(LMM)的统计特性。首先,我们研究了LMM对亲缘关系矩阵中包含候选单核苷酸多态性(SNP)的敏感性,这在实际中经常用于加速计算。我们的结果揭示了包含候选SNP所产生的误差大小,为这种在速度和准确性之间进行权衡的技术提供了依据。其次,我们研究了混合模型如何校正GWAS中的混杂因素,这被广泛认为是LMM相对于传统方法的一个优势。我们考虑了两种混杂因素来源——群体分层和环境混杂因素——并研究了实际中常用的不同方法如何以不同方式权衡这两种混杂因素。