自适应试验中政策评估的置信区间。

Confidence intervals for policy evaluation in adaptive experiments.

机构信息

Stanford Graduate School of Business, Stanford University, Stanford, CA 94305;

Stanford Graduate School of Business, Stanford University, Stanford, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2014602118.

DOI:10.1073/pnas.2014602118

PMID:33876748

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8054003/

Abstract

Adaptive experimental designs can dramatically improve efficiency in randomized trials. But with adaptively collected data, common estimators based on sample means and inverse propensity-weighted means can be biased or heavy-tailed. This poses statistical challenges, in particular when the experimenter would like to test hypotheses about parameters that were not targeted by the data-collection mechanism. In this paper, we present a class of test statistics that can handle these challenges. Our approach is to adaptively reweight the terms of an augmented inverse propensity-weighting estimator to control the contribution of each term to the estimator's variance. This scheme reduces overall variance and yields an asymptotically normal test statistic. We validate the accuracy of the resulting estimates and their CIs in numerical experiments and show that our methods compare favorably to existing alternatives in terms of mean squared error, coverage, and CI size.

摘要

自适应实验设计可以显著提高随机试验的效率。但是，对于自适应收集的数据，基于样本均值和逆概率加权均值的常见估计量可能存在偏差或长尾。这带来了统计方面的挑战，特别是当实验者希望检验关于数据收集机制未针对的参数的假设时。在本文中，我们提出了一类可以处理这些挑战的检验统计量。我们的方法是自适应地重新加权增强逆概率加权估计量的项，以控制每个项对估计量方差的贡献。该方案降低了总体方差，并产生了渐近正态的检验统计量。我们在数值实验中验证了所得估计值及其置信区间的准确性，并表明我们的方法在均方误差、覆盖范围和置信区间大小方面优于现有替代方法。