Division of Biostatistics, Kitasato University School of Pharmacy, Tokyo, 108-8641, Japan.
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
Stat Med. 2018 Jul 10;37(15):2307-2320. doi: 10.1002/sim.7661. Epub 2018 Apr 22.
In randomized clinical trials where time-to-event is the primary outcome, almost routinely, the logrank test is prespecified as the primary test and the hazard ratio is used to quantify treatment effect. If the ratio of 2 hazard functions is not constant, the logrank test is not optimal and the interpretation of hazard ratio is not obvious. When such a nonproportional hazards case is expected at the design stage, the conventional practice is to prespecify another member of weighted logrank tests, eg, Peto-Prentice-Wilcoxon test. Alternatively, one may specify a robust test as the primary test, which can capture various patterns of difference between 2 event time distributions. However, most of those tests do not have companion procedures to quantify the treatment difference, and investigators have fallen back on reporting treatment effect estimates not associated with the primary test. Such incoherence in the "test/estimation" procedure may potentially mislead clinicians/patients who have to balance risk-benefit for treatment decision. To address this, we propose a flexible and coherent test/estimation procedure based on restricted mean survival time, where the truncation time τ is selected data dependently. The proposed procedure is composed of a prespecified test and an estimation of corresponding robust and interpretable quantitative treatment effect. The utility of the new procedure is demonstrated by numerical studies based on 2 randomized cancer clinical trials; the test is dramatically more powerful than the logrank, Wilcoxon tests, and the restricted mean survival time-based test with a fixed τ, for the patterns of difference seen in these cancer clinical trials.
在以生存时间为主要结局的随机临床试验中,几乎普遍地,对数秩检验被预设为主要检验,风险比用于量化治疗效果。如果 2 个风险函数的比值不恒定,则对数秩检验不是最优的,风险比的解释也不明显。如果在设计阶段预计会出现这种非比例风险情况,则常规做法是预设加权对数秩检验的另一个成员,例如 Peto-Prentice-Wilcoxon 检验。或者,可以指定稳健检验作为主要检验,该检验可以捕捉 2 个事件时间分布之间差异的各种模式。然而,大多数这些检验没有伴随程序来量化治疗差异,并且研究人员不得不依赖于报告与主要检验不相关的治疗效果估计。这种“检验/估计”程序中的不一致性可能会误导需要权衡治疗风险-获益以做出治疗决策的临床医生/患者。为了解决这个问题,我们提出了一种基于受限平均生存时间的灵活而一致的检验/估计程序,其中截断时间 τ 是数据依赖选择的。所提出的程序由预设检验和相应的稳健且可解释的定量治疗效果估计组成。基于 2 个随机癌症临床试验的数值研究证明了新程序的实用性;对于这些癌症临床试验中观察到的差异模式,该检验比对数秩检验、Wilcoxon 检验以及基于固定 τ 的受限平均生存时间检验具有更高的功效。