Department of Statistics, George Washington University, Washington, DC.
Department of Statistics, Purdue University, West Lafayette, Indiana.
Stat Med. 2024 Jul 10;43(15):2894-2927. doi: 10.1002/sim.10075. Epub 2024 May 13.
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
从大型实验和观察数据中估计因果效应在工业界和研究界越来越流行。引导法是一种直观而强大的技术,用于构建估计量的标准误差和置信区间。然而,在涉及大数据的情况下,其应用可能会非常繁琐。此外,基于机器学习和优化技术的现代因果推理估计器增加了引导法的计算负担。在大数据的非因果环境中已经提出了小袋引导法,但尚未应用于评估因果效应估计量的性质。本文介绍了一种新的引导算法,称为大数据因果小袋引导法,用于大数据的因果推理。新算法在提供一致估计和理想置信区间覆盖的同时,显著提高了传统引导法的计算效率。我们描述了它的性质,提供了实际考虑,并在模拟研究中评估了所提出算法在偏差、真实 95%置信区间覆盖和计算时间方面的性能。我们将其应用于使用妇女健康倡议的大型观察数据集评估激素治疗对冠心病平均发病时间的影响。