Greevy Robert, Lu Bo, Silber Jeffrey H, Rosenbaum Paul
Department of Statistics, The Wharton School, University of Pennsylvania, 400 Jon M. Huntsman Hall, 3730 Walnut Street, Philadelphia, PA 19104-6340, USA.
Biostatistics. 2004 Apr;5(2):263-75. doi: 10.1093/biostatistics/5.2.263.
Although blocking or pairing before randomization is a basic principle of experimental design, the principle is almost invariably applied to at most one or two blocking variables. Here, we discuss the use of optimal multivariate matching prior to randomization to improve covariate balance for many variables at the same time, presenting an algorithm and a case-study of its performance. The method is useful when all subjects, or large groups of subjects, are randomized at the same time. Optimal matching divides a single group of 2n subjects into n pairs to minimize covariate differences within pairs-the so-called nonbipartite matching problem-then one subject in each pair is picked at random for treatment, the other being assigned to control. Using the baseline covariate data for 132 patients from an actual, unmatched, randomized experiment, we construct 66 pairs matching for 14 covariates. We then create 10000 unmatched and 10000 matched randomized experiments by repeatedly randomizing the 132 patients, and compare the covariate balance with and without matching. By every measure, every one of the 14 covariates was substantially better balanced when randomization was performed within matched pairs. Even after covariance adjustment for chance imbalances in the 14 covariates, matched randomizations provided more accurate estimates than unmatched randomizations, the increase in accuracy being equivalent to, on average, a 7% increase in sample size. In randomization tests of no treatment effect, matched randomizations using the signed rank test had substantially higher power than unmatched randomizations using the rank sum test, even when only 2 of 14 covariates were relevant to a simulated response. Unmatched randomizations experienced rare disasters which were consistently avoided by matched randomizations.
尽管在随机分组前进行区组划分或配对是实验设计的基本原则,但该原则几乎总是最多应用于一两个区组变量。在此,我们讨论在随机分组前使用最优多变量匹配以同时改善多个变量的协变量平衡,给出一种算法及其性能的案例研究。当所有受试者或大量受试者同时进行随机分组时,该方法很有用。最优匹配将一组2n个受试者分成n对,以最小化配对内的协变量差异——即所谓的非二分匹配问题——然后随机选择每对中的一个受试者进行治疗,另一个则分配到对照组。利用来自一个实际的、未匹配的随机实验的132名患者的基线协变量数据,我们构建了针对14个协变量进行匹配的66对。然后,通过对这132名患者反复进行随机分组,我们创建了10000个未匹配的和10000个匹配的随机实验,并比较了匹配和未匹配情况下的协变量平衡。从各方面衡量,当在匹配对中进行随机分组时,14个协变量中的每一个的平衡都有显著改善。即使对14个协变量中的偶然不平衡进行协方差调整后,匹配随机分组提供的估计比未匹配随机分组更准确,准确性的提高平均相当于样本量增加7%。在无治疗效果的随机检验中,使用符号秩检验的匹配随机分组比使用秩和检验的未匹配随机分组具有更高的检验效能,即使14个协变量中只有2个与模拟反应相关。未匹配随机分组会出现罕见的严重偏差,而匹配随机分组则始终能避免这种情况。