一种适用于孟德尔随机化中具有高维协变量的效应异质性研究的数据分析方法。

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization.

机构信息

MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.

British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.

出版信息

BMC Med Res Methodol. 2024 Feb 10;24(1):34. doi: 10.1186/s12874-024-02153-1.

DOI:10.1186/s12874-024-02153-1

PMID:38341532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10858611/

Abstract

BACKGROUND

Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-"randomization", naive stratification typically induces collider bias in stratum-specific estimates.

METHOD

We extend a previously proposed stratification method (the "doubly-ranked method") to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity).

RESULT

We show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others.

CONCLUSION

Our data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.

摘要

背景

孟德尔随机化是一种利用遗传变异作为工具变量，从观察性数据中进行因果推断的常用方法。与随机试验类似，标准的孟德尔随机化分析估计暴露对结局的人群平均效应。将人群分为亚组可以揭示效应异质性，为干预暴露提供最受益人群的信息。然而，由于协变量是在“随机化”后测量的，简单的分层通常会导致特定层的估计中产生混杂偏差。

方法

我们扩展了先前提出的分层方法（“双重排序方法”），以基于单个协变量形成分层，并引入了一种数据自适应随机森林方法，以基于高维协变量集计算对混杂偏差具有稳健性的分层特定估计。我们还提出了基于 Q 统计量的衡量标准，以评估分层特定估计之间的异质性（了解估计是否由于偶然因素而比预期更具变异性）和变量重要性（确定效应异质性的关键驱动因素）。