MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
BMC Med Res Methodol. 2024 Feb 10;24(1):34. doi: 10.1186/s12874-024-02153-1.
Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-"randomization", naive stratification typically induces collider bias in stratum-specific estimates.
We extend a previously proposed stratification method (the "doubly-ranked method") to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity).
We show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others.
Our data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.
孟德尔随机化是一种利用遗传变异作为工具变量,从观察性数据中进行因果推断的常用方法。与随机试验类似,标准的孟德尔随机化分析估计暴露对结局的人群平均效应。将人群分为亚组可以揭示效应异质性,为干预暴露提供最受益人群的信息。然而,由于协变量是在“随机化”后测量的,简单的分层通常会导致特定层的估计中产生混杂偏差。
我们扩展了先前提出的分层方法(“双重排序方法”),以基于单个协变量形成分层,并引入了一种数据自适应随机森林方法,以基于高维协变量集计算对混杂偏差具有稳健性的分层特定估计。我们还提出了基于 Q 统计量的衡量标准,以评估分层特定估计之间的异质性(了解估计是否由于偶然因素而比预期更具变异性)和变量重要性(确定效应异质性的关键驱动因素)。
我们表明,体重指数(BMI)对肺功能的影响存在异质性,主要取决于臀围和体重。虽然对于大多数个体,增加 BMI 对肺功能的预测效应是负面的,但对于某些个体,其效应是正面的,而对于其他个体则是强烈的负面的。
我们的数据自适应方法允许在孟德尔随机化框架内探索暴露与结局之间的关系中的效应异质性。这可以为疾病发病机制提供有价值的见解,并有助于确定从针对暴露的靶向干预中受益最大的特定人群。