Université de Paris, MERIT, IRD, 75006, Paris, France.
Université Paris-Saclay, UVSQ, Inserm, CESP, 94807, Villejuif, France.
BMC Bioinformatics. 2020 Nov 23;21(1):536. doi: 10.1186/s12859-020-03862-2.
Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants' effects. We propose two computationally efficient methods to estimate the variants' effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa.
We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants' effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample.
The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).
混合线性模型(MLM)已被广泛应用于全基因组关联研究中,以解决群体结构问题,分析状态作为定量表型。Chen 等人在 2016 年证明,在某些情况下,该方法是不适用的,并提出了 GMMAT,一种用于混合逻辑回归(MLR)的得分检验。然而,这种检验并不能估计变异的效应。我们提出了两种计算效率高的方法来估计变异的效应。我们使用最近在西非两个地理位置相近的人群中进行的全基因组关联研究的模拟和真实基因组数据,评估了这两种方法与其他方法(MLM、逻辑回归)的性质。
我们表明,当疾病在人群分层中的患病率不同时,MLM 不适合分析二元性状。在所有情况下,MLR 表现最好。我们的方法很好地评估了变异的效应,当效应大小较大时,存在适度的偏差。此外,我们提出了一种分层 QQ-plot,可以增强当样本中未明确识别出人群分层时对 p 值膨胀或紧缩的诊断。
我们提出的两种方法已在 R 包 milorGWAS 中实现,该包可在 CRAN 上获得。这两种方法至少可以扩展到 10000 个人。相同的计算策略可以应用于其他模型(例如,用于生存分析的混合 Cox 模型)。