Tang Nian-Sheng, Yu Bin, Tang Man-Lai
Department of Statistics, Yunnan University, No,2 Cuihu North Road, 650091 Kunming, China.
BMC Med Res Methodol. 2014 Dec 18;14:134. doi: 10.1186/1471-2288-14-134.
A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority.
Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power.
Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures.
Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.
由于伦理问题,通常采用双臂非劣效性试验(无安慰剂组)来证明一种实验性治疗不比对照治疗差,且两者差异不超过预先设定的一个小的非劣效性界值。非劣效性界值的选择和检测灵敏度的确定是双臂非劣效性试验设计、分析和解释中的两个主要问题。另外,通常会进行包含安慰剂组的三臂非劣效性临床试验,以评估试验的检测灵敏度和内部效度。最近,已经开发出一些大样本方法来基于三臂试验设计评估新治疗的非劣效性。然而,这些方法在三臂样本量较小时表现不佳。本论文旨在开发一些可靠的小样本方法来检验三臂非劣效性。
开发了鞍点近似法、精确和近似无条件法以及自助重采样法来计算Wald型检验、计分检验和似然比检验的p值。进行模拟研究以评估它们在I型错误率和检验效能方面的表现。
我们的实证结果表明,鞍点近似法通常比基于Wald型检验统计量的渐近方法表现更好。对于小样本量,基于计分检验统计量的近似无条件法和自助重采样法表现更好,因为它们相应的I型错误率通常比其他检验程序更接近预先设定的名义水平。
对于具有二元结局的三臂非劣效性试验,通常推荐基于计分检验统计量的近似无条件法和自助重采样检验程序。