一种用于比较响应自适应随机化中检验统计量的仿真研究。

A simulation study for comparing testing statistics in response-adaptive randomization.

机构信息

Department of Biostatistics, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, PO Box 301402, Unit 1411, Houston, Texas 77230-1402, USA.

出版信息

BMC Med Res Methodol. 2010 Jun 5;10:48. doi: 10.1186/1471-2288-10-48.

DOI:10.1186/1471-2288-10-48

PMID:20525382

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2911470/

Abstract

BACKGROUND

Response-adaptive randomizations are able to assign more patients in a comparative clinical trial to the tentatively better treatment. However, due to the adaptation in patient allocation, the samples to be compared are no longer independent. At large sample sizes, many asymptotic properties of test statistics derived for independent sample comparison are still applicable in adaptive randomization provided that the patient allocation ratio converges to an appropriate target asymptotically. However, the small sample properties of commonly used test statistics in response-adaptive randomization are not fully studied.

METHODS

Simulations are systematically conducted to characterize the statistical properties of eight test statistics in six response-adaptive randomization methods at six allocation targets with sample sizes ranging from 20 to 200. Since adaptive randomization is usually not recommended for sample size less than 30, the present paper focuses on the case with a sample of 30 to give general recommendations with regard to test statistics for contingency tables in response-adaptive randomization at small sample sizes.

RESULTS

Among all asymptotic test statistics, the Cook's correction to chi-square test (TMC) is the best in attaining the nominal size of hypothesis test. The William's correction to log-likelihood ratio test (TML) gives slightly inflated type I error and higher power as compared with TMC, but it is more robust against the unbalance in patient allocation. TMC and TML are usually the two test statistics with the highest power in different simulation scenarios. When focusing on TMC and TML, the generalized drop-the-loser urn (GDL) and sequential estimation-adjusted urn (SEU) have the best ability to attain the correct size of hypothesis test respectively. Among all sequential methods that can target different allocation ratios, GDL has the lowest variation and the highest overall power at all allocation ratios. The performance of different adaptive randomization methods and test statistics also depends on allocation targets. At the limiting allocation ratio of drop-the-loser (DL) and randomized play-the-winner (RPW) urn, DL outperforms all other methods including GDL. When comparing the power of test statistics in the same randomization method but at different allocation targets, the powers of log-likelihood-ratio, log-relative-risk, log-odds-ratio, Wald-type Z, and chi-square test statistics are maximized at their corresponding optimal allocation ratios for power. Except for the optimal allocation target for log-relative-risk, the other four optimal targets could assign more patients to the worse arm in some simulation scenarios. Another optimal allocation target, RRSIHR, proposed by Rosenberger and Sriram (Journal of Statistical Planning and Inference, 1997) is aimed at minimizing the number of failures at fixed power using Wald-type Z test statistics. Among allocation ratios that always assign more patients to the better treatment, RRSIHR usually has less variation in patient allocation, and the values of variation are consistent across all simulation scenarios. Additionally, the patient allocation at RRSIHR is not too extreme. Therefore, RRSIHR provides a good balance between assigning more patients to the better treatment and maintaining the overall power.

CONCLUSION

The Cook's correction to chi-square test and Williams' correction to log-likelihood-ratio test are generally recommended for hypothesis test in response-adaptive randomization, especially when sample sizes are small. The generalized drop-the-loser urn design is the recommended method for its good overall properties. Also recommended is the use of the RRSIHR allocation target.

摘要

背景

响应适应性随机化能够在比较临床试验中为更多患者分配到暂定更好的治疗方法。然而，由于患者分配的适应性，待比较的样本不再独立。在大样本量的情况下，许多针对独立样本比较得出的检验统计量的渐近性质在适应性随机化中仍然适用，只要患者分配比例渐近地收敛到适当的目标。然而，响应适应性随机化中常用检验统计量的小样本性质尚未得到充分研究。

方法

系统地进行了模拟，以在六个响应适应性随机化方法中，在六个分配目标下，用样本量从 20 到 200 的数据来描述八种检验统计量的统计性质。由于自适应随机化通常不推荐用于样本量小于 30 的情况，因此本文重点介绍样本量为 30 的情况，以便在小样本量时为响应适应性随机化中的列联表检验提供一般性的建议。

结果

在所有渐近检验统计量中，卡方检验的库克校正（TMC）在达到假设检验的名义大小方面表现最佳。威廉姆斯对数似然比检验的校正（TML）与 TMC 相比，略微膨胀了 I 型错误和更高的功效，但它对患者分配的不平衡更稳健。TMC 和 TML 通常是不同模拟场景中具有最高功效的两个检验统计量。当关注 TMC 和 TML 时，广义弃一者 urn（GDL）和序贯估计调整 urn（SEU）分别具有最佳的能力达到假设检验的正确大小。在所有可以针对不同分配比例的序贯方法中，GDL 在所有分配比例下具有最低的变异和最高的整体功效。不同适应性随机化方法和检验统计量的性能也取决于分配目标。在限制定向弃一者（DL）和随机玩胜者（RPW）urn 的分配比例时，DL 优于包括 GDL 在内的所有其他方法。在比较相同随机化方法但在不同分配目标下的检验统计量功效时，对数似然比、对数相对风险、对数优势比、 Wald 型 Z 和卡方检验统计量的功效在其相应的最优分配比例下达到最大值。除了对数相对风险的最优分配目标外，其他四个最优目标在某些模拟场景中可以将更多患者分配到较差的治疗组。罗森伯格和斯里拉姆（1997 年《统计规划与推理杂志》）提出的另一个最优分配目标，RRSIHR，旨在使用 Wald 型 Z 检验统计量在固定功效下最小化失败数。在始终将更多患者分配到更好治疗方法的分配比例中，RRSIHR 通常具有较少的患者分配变异，并且在所有模拟场景中，变异值是一致的。此外，RRSIHR 的患者分配并不太极端。因此，RRSIHR 在将更多患者分配到更好的治疗方法和保持整体功效之间提供了良好的平衡。