1 University of California, Berkeley, CA, USA.
2 Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA.
Stat Methods Med Res. 2019 Feb;28(2):532-554. doi: 10.1177/0962280217729845. Epub 2017 Sep 22.
Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is as opposed to the original , a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.
在大型半参数模型中,稳健地推断低维参数依赖于数据分布的无限维特征的外部估计量。通常,仅优化后者之一,以便构建感兴趣的低维参数的行为良好的估计量。为了在感兴趣的参数的估计中实现更好的偏差方差权衡,优化多个特征是驱动协作靶向最小损失估计过程的通用模板的核心思想。协作靶向最小损失估计模板的原始实例可以表示为贪婪向前逐步协作靶向最小损失估计算法。当协变量的数量 p 急剧增加时,它的规模不会很好。这促使我们引入了协作靶向最小损失估计模板的新实例,其中协变量是预排序的。它的时间复杂度为 ,而原始的 ,这是一个显著的改进。我们提出了两种预排序策略,并建议了一种规则来开发其他有意义的策略。因为通常不清楚应该选择哪种预排序策略,所以我们还引入了另一种称为 SL-C-TMLE 的实例化方法,该方法可以根据手头的问题,实现更好的预排序策略的数据驱动选择。它的时间复杂度也是 。在涉及完全合成数据或基于真实世界大型电子健康数据库的部分合成数据的模拟研究中,以及在对三个真实的大型电子健康数据库的分析中,比较了这些算法的计算负担和相对性能。在所有涉及电子健康数据库的分析中,贪婪协作靶向最小损失估计算法的速度都不可接受。模拟研究似乎表明,我们的可扩展协作靶向最小损失估计和 SL-C-TMLE 算法运行良好。所有 C-TMLE 都可以在 Julia 软件包中公开获得。