Lee Paul H, Burstyn Igor
School of Nursing, PQ433, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, USA.
BMC Med Res Methodol. 2016 May 18;16:54. doi: 10.1186/s12874-016-0159-6.
Common methods for confounder identification such as directed acyclic graphs (DAGs), hypothesis testing, or a 10 % change-in-estimate (CIE) criterion for estimated associations may not be applicable due to (a) insufficient knowledge to draw a DAG and (b) when adjustment for a true confounder produces less than 10 % change in observed estimate (e.g. in presence of measurement error).
We compare previously proposed simulation-based approach for confounder identification that can be tailored to each specific study and contrast it with commonly applied methods (significance criteria with cutoff levels of p-values of 0.05 or 0.20, and CIE criterion with a cutoff of 10 %), as well as newly proposed two-stage procedure aimed at reduction of false positives (specifically, risk factors that are not confounders). The new procedure first evaluates potential for confounding by examination of correlation of covariates and applies simulated CIE criteria only if there is evidence of correlation, while rejecting a covariate as confounder otherwise. These approaches are compared in simulations studies with binary, continuous, and survival outcomes. We illustrate the application of our proposed confounder identification strategy in examining the association of exposure to mercury in relation to depression in the presence of suspected confounding by fish intake using the National Health and Nutrition Examination Survey (NHANES) 2009-2010 data.
Our simulations showed that the simulation-determined cutoff was very sensitive to measurement error in exposure and potential confounder. The analysis of NHANES data demonstrated that if the noise-to-signal ratio (error variance in confounder/variance of confounder) is at or below 0.5, roughly 80 % of the simulated analyses adjusting for fish consumption would correctly result in a null association of mercury and depression, and only an extremely poorly measured confounder is not useful to adjust for in this setting.
No a prior criterion developed for a specific application is guaranteed to be suitable for confounder identification in general. The customization of model-building strategies and study designs through simulations that consider the likely imperfections in the data, as well as finite-sample behavior, would constitute an important improvement on some of the currently prevailing practices in confounder identification and evaluation.
用于识别混杂因素的常见方法,如定向无环图(DAGs)、假设检验或估计关联的10%估计值变化(CIE)标准,可能并不适用,原因如下:(a)绘制DAGs的知识不足;(b)对真正的混杂因素进行调整时,观察到的估计值变化小于10%(例如存在测量误差时)。
我们比较了先前提出的基于模拟的混杂因素识别方法,该方法可针对每项具体研究进行调整,并将其与常用方法(p值截止水平为0.05或0.20的显著性标准,以及截止值为10%的CIE标准),以及新提出的旨在减少假阳性(具体而言,即不是混杂因素的危险因素)的两阶段程序进行对比。新程序首先通过检查协变量的相关性来评估混杂可能性,只有在存在相关性证据时才应用模拟的CIE标准,否则将某个协变量排除在混杂因素之外。在具有二元、连续和生存结局的模拟研究中对这些方法进行比较。我们使用2009 - 2010年国家健康与营养检查调查(NHANES)数据,说明了我们提出的混杂因素识别策略在检查汞暴露与抑郁症关联时的应用,此时存在鱼类摄入作为疑似混杂因素的情况。
我们的模拟表明,模拟确定的截止值对暴露和潜在混杂因素中的测量误差非常敏感。NHANES数据分析表明,如果噪声与信号比(混杂因素中的误差方差/混杂因素的方差)等于或低于0.5,大约80%针对鱼类消费进行调整的模拟分析将正确得出汞与抑郁症无关联的结果,并且在这种情况下,只有测量极差的混杂因素调整起来才没有用。
针对特定应用制定的先验标准不能保证普遍适用于混杂因素识别。通过考虑数据中可能存在的缺陷以及有限样本行为的模拟来定制模型构建策略和研究设计,将对目前混杂因素识别和评估中的一些普遍做法构成重要改进。