Li Sijia, Luedtke Alex
Department of Biostatistics, University of Washington, Seattle, Washington 98195.
Department of Statistics, University of Washington, Box 354322, Seattle, Washington 98195.
Biometrika. 2023 Dec;110(4):1041-1054. doi: 10.1093/biomet/asad007. Epub 2023 Feb 6.
We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.
我们旨在通过融合来自多个来源的数据,对一个平滑的有限维参数进行推断。以往的研究在类似的数据融合设置中,对各种参数的估计进行了探讨,包括在政策下平均治疗效果和平均奖励的估计,其中大多数研究将一个历史数据源与协变量、行动和奖励以及一个具有相同协变量的数据源合并。在这项工作中,我们考虑一般情况,即一个或多个数据源与目标人群分布的每个部分对齐,例如,给定行动和协变量的奖励的条件分布。我们描述了在单一分析中融合这些数据源可能带来的效率提升,我们通过半参数效率界的降低来刻画这一点。我们还提供了一种构建达到这些界的估计量的通用方法。在数值模拟中,我们说明了使用我们提出的估计量而不是其自然替代方法在效率上的显著提高。最后,我们通过融合来自两项HIV疫苗试验的数据,说明了在疫苗免疫原性研究中可以实现的效率提升幅度。