Warrington Nicole M, Tilling Kate, Howe Laura D, Paternoster Lavinia, Pennell Craig E, Wu Yan Yan, Briollais Laurent
Stat Appl Genet Mol Biol. 2014 Oct;13(5):567-87. doi: 10.1515/sagmb-2013-0066.
Genome-wide association studies have been successful in uncovering novel genetic variants that are associated with disease status or cross-sectional phenotypic traits. Researchers are beginning to investigate how genes play a role in the development of a trait over time. Linear mixed effects models (LMM) are commonly used to model longitudinal data; however, it is unclear if the failure to meet the models distributional assumptions will affect the conclusions when conducting a genome-wide association study. In an extensive simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical power when the error of the LMM is either heteroscedastic or has a non-Gaussian distribution. We conclude that the model is robust to misspecification if the same function of age is included in the fixed and random effects. However, type 1 error of the genetic effect over time is inflated, regardless of the model misspecification, if the polynomial function for age in the fixed and random effects differs. In situations where the model will not converge with a high order polynomial function in the random effects, a reduced function can be used but a robust standard error needs to be calculated to avoid inflation of the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index (BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the robust standard error to ensure correct inference of associations of longitudinal BMI with chromosome 16 single nucleotide polymorphisms.
全基因组关联研究已成功发现与疾病状态或横断面表型特征相关的新基因变异。研究人员开始研究基因如何随时间在性状发育中发挥作用。线性混合效应模型(LMM)常用于对纵向数据进行建模;然而,在进行全基因组关联研究时,未能满足模型分布假设是否会影响结论尚不清楚。在一项广泛的模拟研究中,我们比较了LMM的误差为异方差或具有非高斯分布时的覆盖概率、偏差、I型错误率和统计功效。我们得出结论,如果固定效应和随机效应中包含相同的年龄函数,则该模型对错误设定具有鲁棒性。然而,如果固定效应和随机效应中年龄的多项式函数不同,无论模型错误设定如何,随时间推移的遗传效应的I型错误都会膨胀。在随机效应中使用高阶多项式函数时模型无法收敛的情况下,可以使用简化函数,但需要计算稳健标准误以避免I型错误膨胀。作为一个例证,将LMM应用于ALSPAC队列中儿童期的纵向体重指数(BMI)数据;结果强调了稳健标准误对于确保正确推断纵向BMI与16号染色体单核苷酸多态性之间关联的必要性。