Institute of Biometry and Clinical Epidemiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany.
Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria.
Int J Environ Res Public Health. 2023 Feb 11;20(4):3182. doi: 10.3390/ijerph20043182.
Randomization is an effective design option to prevent bias from confounding in the evaluation of the causal effect of interventions on outcomes. However, in some cases, randomization is not possible, making subsequent adjustment for confounders essential to obtain valid results. Several methods exist to adjust for confounding, with multivariable modeling being among the most widely used. The main challenge is to determine which variables should be included in the causal model and to specify appropriate functional relations for continuous variables in the model. While the statistical literature gives a variety of recommendations on how to build multivariable regression models in practice, this guidance is often unknown to applied researchers. We set out to investigate the current practice of explanatory regression modeling to control confounding in the field of cardiac rehabilitation, for which mainly non-randomized observational studies are available. In particular, we conducted a systematic methods review to identify and compare statistical methodology with respect to statistical model building in the context of the existing recent systematic review CROS-II, which evaluated the prognostic effect of cardiac rehabilitation. CROS-II identified 28 observational studies, which were published between 2004 and 2018. Our methods review revealed that 24 (86%) of the included studies used methods to adjust for confounding. Of these, 11 (46%) mentioned how the variables were selected and two studies (8%) considered functional forms for continuous variables. The use of background knowledge for variable selection was barely reported and data-driven variable selection methods were applied frequently. We conclude that in the majority of studies, the methods used to develop models to investigate the effect of cardiac rehabilitation on outcomes do not meet common criteria for appropriate statistical model building and that reporting often lacks precision.
随机化是一种有效的设计选择,可以防止混杂因素对干预措施对结果的因果效应评估的偏差。然而,在某些情况下,随机化是不可能的,因此随后对混杂因素进行调整对于获得有效的结果至关重要。有几种方法可以调整混杂因素,其中多变量建模是最广泛使用的方法之一。主要的挑战是确定哪些变量应该包含在因果模型中,并为模型中的连续变量指定适当的函数关系。虽然统计文献提供了各种关于如何在实践中构建多变量回归模型的建议,但应用研究人员通常不知道这些建议。我们着手研究心脏康复领域解释性回归建模以控制混杂因素的当前实践,该领域主要提供非随机观察性研究。特别是,我们进行了一项系统的方法回顾,以识别和比较在现有近期系统评价 CROS-II 背景下进行统计模型构建的统计方法,该评价评估了心脏康复的预后效果。CROS-II 确定了 28 项观察性研究,这些研究发表于 2004 年至 2018 年之间。我们的方法回顾表明,24 项(86%)纳入的研究使用了调整混杂因素的方法。其中,11 项(46%)提到了如何选择变量,两项研究(8%)考虑了连续变量的函数形式。很少有研究报告使用背景知识进行变量选择,并且经常应用数据驱动的变量选择方法。我们得出的结论是,在大多数研究中,用于研究心脏康复对结果的影响的模型开发方法不符合适当的统计模型构建的常见标准,并且报告通常缺乏准确性。