From the University of California, Berkeley, Berkeley, CA.
Epidemiology. 2024 Nov 1;35(6):791-800. doi: 10.1097/EDE.0000000000001773. Epub 2024 Aug 1.
The Causal Roadmap outlines a systematic approach to asking and answering questions of cause and effect: define the quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. To protect research integrity, it is essential that the algorithm for statistical estimation and inference be prespecified prior to conducting any effectiveness analyses. However, it is often unclear which algorithm will perform optimally for the real-data application. Instead, there is a temptation to simply implement one's favorite algorithm, recycling prior code or relying on the default settings of a computing package. Here, we call for the use of simulations that realistically reflect the application, including key characteristics such as strong confounding and dependent or missing outcomes, to objectively compare candidate estimators and facilitate full specification of the statistical analysis plan. Such simulations are informed by the Causal Roadmap and conducted after data collection but prior to effect estimation. We illustrate with two worked examples. First, in an observational longitudinal study, we use outcome-blind simulations to inform nuisance parameter estimation and variance estimation for longitudinal targeted minimum loss-based estimation. Second, in a cluster randomized trial with missing outcomes, we use treatment-blind simulations to examine type-I error control in two-stage targeted minimum loss-based estimation. In both examples, realistic simulations empower us to prespecify an estimation approach with strong expected finite sample performance, and also produce quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.
定义感兴趣的数量,评估所需的假设,进行统计估计,并仔细解释结果。为了保护研究的完整性,至关重要的是,在进行任何有效性分析之前,预先指定用于统计估计和推断的算法。然而,通常不清楚哪种算法将最适合实际数据应用。相反,人们往往会简单地实施自己喜欢的算法,重复使用先前的代码或依赖计算包的默认设置。在这里,我们呼吁使用真实反映应用的模拟,包括强混杂和依赖或缺失结果等关键特征,客观地比较候选估计量,并促进统计分析计划的充分规范。这种模拟是在数据收集后但在效果估计之前,根据因果关系路线图进行的。我们通过两个示例来说明。首先,在一项观察性纵向研究中,我们使用基于结局的盲模拟来为基于纵向有向无环图的最小损失估计中的干扰参数估计和方差估计提供信息。其次,在一项具有缺失结局的集群随机试验中,我们使用基于治疗的盲模拟来检验两阶段基于有向无环图的最小损失估计中的Ⅰ型错误控制。在这两个例子中,真实的模拟使我们能够预先指定一种具有强预期有限样本性能的估计方法,并为实际分析生成经过质量控制的计算代码。总的来说,这个过程有助于提高我们研究的严谨性和可重复性。