The University of British Columbia, Vancouver, British Columbia, Canada.
The University of British Columbia, Vancouver, British Columbia, Canada.
Environ Res. 2019 Aug;175:108-116. doi: 10.1016/j.envres.2019.05.010. Epub 2019 May 11.
Indirect adjustment via partitioned regression is a promising technique to control for unmeasured confounding in large epidemiological studies. The method uses a representative ancillary dataset to estimate the association between variables missing in a primary dataset with the complete set of variables of the ancillary dataset to produce an adjusted risk estimate for the variable in question. The objective of this paper is threefold: 1) evaluate the method for non-linear survival models, 2) formalize an empirical process to evaluate the suitability of the required ancillary matching dataset, and 3) test modifications to the method to incorporate time-varying exposure data, and proportional weighting of datasets.
We used the association between fine particle air pollution (PM) with mortality in the 2001 Canadian Census Health and Environment Cohort (CanCHEC, N = 2.4 million, 10-years follow-up) as our primary dataset, and the 2001 cycle of the Canadian Community Health Survey (CCHS, N = 80,630) as the ancillary matching dataset that contained confounding risk factor information not available in CanCHEC (e.g., smoking). The main evaluation process used a gold-standard approach wherein two variables (education and income) available in both datasets were excluded, indirectly adjusted for, and compared to true models with education and income included to assess the amount of bias correction. An internal validation for objective 1 used only CanCHEC data, whereas an external validation for objective 2 replaced CanCHEC with the CCHS. The two proposed modifications were applied as part of the validation tests, as well as in a final indirect adjustment of four missing risk factor variables (smoking, alcohol use, diet, and exercise) in which adjustment direction and magnitude was compared to models using an equivalent longitudinal cohort with direct adjustment for the same variables.
At baseline (2001) both cohorts had very similar PM distributions across population characteristics, although levels for CCHS participants were consistently 1.8-2.0 μg/m lower. Applying sample-weighting largely corrected for this discrepancy. The internal validation tests showed minimal downward bias in PM mortality hazard ratios of 0.4-0.6% using a static exposure, and 1.7-3% when a time-varying exposure was used. The external validation of the CCHS as the ancillary dataset showed slight upward bias of -0.7 to -1.1% and downward bias of 1.3-2.3% using the static and time-varying approaches respectively.
The CCHS was found to be fairly well representative of CanCHEC and its use in Canada for indirect adjustment is warranted. Indirect adjustment methods can be used with survival models to correct hazard ratio point estimates and standard errors in models missing key covariates when a representative matching dataset is available. The results of this formal evaluation should encourage other cohorts to assess the suitability of ancillary datasets for the application of the indirect adjustment methodology to address potential residual confounding.
间接调整通过分区回归是一种有前途的技术,可用于控制大型流行病学研究中未测量的混杂。该方法使用具有代表性的辅助数据集来估计主要数据集中缺失变量与辅助数据集的完整变量之间的关联,以生成有关问题变量的调整风险估计值。本文的目的有三个:1)评估非线性生存模型的方法,2)形式化评估所需辅助匹配数据集适用性的经验过程,3)测试对该方法的修改,以纳入时变暴露数据和数据集的比例加权。
我们使用 2001 年加拿大人口普查健康与环境队列(CanCHEC,N=240 万,10 年随访)中细颗粒物(PM)与死亡率之间的关联作为我们的主要数据集,以及 2001 年加拿大社区健康调查(CCHS)的周期(N=80630)作为辅助匹配数据集,其中包含了 CanCHEC 中未包含的混杂风险因素信息(例如吸烟)。主要评估过程使用了黄金标准方法,其中两个在两个数据集中都可用的变量(教育和收入)被排除在外,进行了间接调整,并与包含教育和收入的真实模型进行了比较,以评估偏倚校正的程度。第 1 个目标的内部验证仅使用了 CanCHEC 数据,而第 2 个目标的外部验证则用 CCHS 替代了 CanCHEC。这两种拟议的修改都作为验证测试的一部分进行了应用,以及在最终的四个缺失风险因素变量(吸烟、饮酒、饮食和运动)的间接调整中进行了应用,其中调整方向和幅度与使用具有相同变量的等效纵向队列进行直接调整的模型进行了比较。
在基线(2001 年),两个队列在人口特征方面的 PM 分布非常相似,尽管 CCHS 参与者的水平始终低 1.8-2.0μg/m。应用样本加权在很大程度上纠正了这种差异。内部验证测试显示,使用静态暴露时,PM 死亡率危险比的最小向下偏差为 0.4-0.6%,使用时变暴露时为 1.7-3%。将 CCHS 作为辅助数据集的外部验证显示,使用静态和时变方法时,分别存在-0.7 到-1.1%的轻微向上偏差和 1.3-2.3%的向下偏差。
发现 CCHS 与 CanCHEC 相当吻合,在加拿大使用其进行间接调整是合理的。当具有代表性的匹配数据集可用时,间接调整方法可用于生存模型,以纠正缺失关键协变量的模型中的危险比点估计值和标准误差。该正式评估的结果应该鼓励其他队列评估辅助数据集在应用间接调整方法以解决潜在残留混杂方面的适用性。