Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, United States of America.
Department of Statistics, Florida State University, Tallahassee, Florida, United States of America.
PLoS Genet. 2024 Apr 22;20(4):e1011246. doi: 10.1371/journal.pgen.1011246. eCollection 2024 Apr.
Genome-wide association studies (GWAS) have identified many genetic loci associated with complex traits and diseases in the past 20 years. Multiple heritable covariates may be added into GWAS regression models to estimate direct effects of genetic variants on a focal trait, or to improve the power by accounting for environmental effects and other sources of trait variations. When one or more covariates are causally affected by both genetic variants and hidden confounders, adjusting for them in GWAS will produce biased estimation of SNP effects, known as collider bias. Several approaches have been developed to correct collider bias through estimating the bias by Mendelian randomization (MR). However, these methods work for only one covariate, some of which utilize MR methods with relatively strong assumptions, both of which may not hold in practice. In this paper, we extend the bias-correction approaches in two aspects: first we derive an analytical expression for the collider bias in the presence of multiple covariates, then we propose estimating the bias using a robust multivariable MR (MVMR) method based on constrained maximum likelihood (called MVMR-cML), allowing the presence of invalid instrumental variables (IVs) and correlated pleiotropy. We also established the estimation consistency and asymptotic normality of the new bias-corrected estimator. We conducted simulations to show that all methods mitigated collider bias under various scenarios. In real data analyses, we applied the methods to two GWAS examples, the first a GWAS of waist-hip ratio with adjustment for only one covariate, body-mass index (BMI), and the second a GWAS of BMI adjusting metabolomic principle components as multiple covariates, illustrating the effectiveness of bias correction.
在过去的 20 年中,全基因组关联研究(GWAS)已经确定了许多与复杂性状和疾病相关的遗传位点。可以将多个可遗传的协变量添加到 GWAS 回归模型中,以估计遗传变异对焦点性状的直接影响,或通过考虑环境效应和其他性状变异来源来提高功效。当一个或多个协变量同时受到遗传变异和隐藏混杂因素的影响时,在 GWAS 中对其进行调整会导致 SNP 效应的估计产生偏差,这种偏差称为碰撞偏差。已经开发了几种方法通过 Mendelian 随机化(MR)来估计偏差来纠正碰撞偏差。然而,这些方法仅适用于一个协变量,其中一些方法利用了具有相对较强假设的 MR 方法,而这些假设在实践中可能并不成立。在本文中,我们从两个方面扩展了偏差校正方法:首先,我们在存在多个协变量的情况下推导出了碰撞偏差的解析表达式,然后我们提出了使用基于约束最大似然的稳健多变量 MR(MVMR)方法来估计偏差(称为 MVMR-cML),允许存在无效工具变量(IVs)和相关的多效性。我们还建立了新的偏差校正估计量的估计一致性和渐近正态性。我们进行了模拟实验,结果表明所有方法在各种情况下都减轻了碰撞偏差。在真实数据分析中,我们将方法应用于两个 GWAS 示例,第一个示例是仅调整一个协变量(体重指数,BMI)的腰围-臀围比的 GWAS,第二个示例是调整多个协变量(代谢组学主成分)的 BMI 的 GWAS,说明了偏差校正的有效性。
Int J Epidemiol. 2021-11-10
Genet Epidemiol. 2024-2
Am J Hum Genet. 2023-4-6
Biometrics. 2024-10-3
Am J Hum Genet. 2023-4-6
Comput Struct Biotechnol J. 2022-5-14
PLoS Genet. 2022-5
J Am Stat Assoc. 2022