MRC Clinical Trials Unit, Institute of Clinical Trials and Methodology, University College London, London, UK.
Clin Trials. 2023 Dec;20(6):594-602. doi: 10.1177/17407745231181907. Epub 2023 Jun 20.
The population-level summary measure is a key component of the estimand for clinical trials with time-to-event outcomes. This is particularly the case for non-inferiority trials, because different summary measures imply different null hypotheses. Most trials are designed using the hazard ratio as summary measure, but recent studies suggested that the difference in restricted mean survival time might be more powerful, at least in certain situations. In a recent letter, we conjectured that differences between summary measures can be explained using the concept of the non-inferiority frontier and that for a fair simulation comparison of summary measures, the same analysis methods, making the same assumptions, should be used to estimate different summary measures. The aim of this article is to make such a comparison between three commonly used summary measures: hazard ratio, difference in restricted mean survival time and difference in survival at a fixed time point. In addition, we aim to investigate the impact of using an analysis method that assumes proportional hazards on the operating characteristics of a trial designed with any of the three summary measures.
We conduct a simulation study in the proportional hazards setting. We estimate difference in restricted mean survival time and difference in survival non-parametrically, without assuming proportional hazards. We also estimate all three measures parametrically, using flexible survival regression, under the proportional hazards assumption.
Comparing the hazard ratio assuming proportional hazards with the other summary measures not assuming proportional hazards, relative performance varies substantially depending on the specific scenario. Fixing the summary measure, assuming proportional hazards always leads to substantial power gains compared to using non-parametric methods. Fixing the modelling approach to flexible parametric regression assuming proportional hazards, difference in restricted mean survival time is most often the most powerful summary measure among those considered.
When the hazards are likely to be approximately proportional, reflecting this in the analysis can lead to large gains in power for difference in restricted mean survival time and difference in survival. The choice of summary measure for a non-inferiority trial with time-to-event outcomes should be made on clinical grounds; when any of the three summary measures discussed here is equally justifiable, difference in restricted mean survival time is most often associated with the most powerful test, on the condition that it is estimated under proportional hazards.
人群水平汇总指标是具有时间事件结局临床试验的估计量的关键组成部分。对于非劣效性试验尤其如此,因为不同的汇总指标意味着不同的零假设。大多数试验都是使用风险比作为汇总指标进行设计的,但最近的研究表明,受限平均生存时间的差异可能更具效力,至少在某些情况下如此。在最近的一封信中,我们推测可以使用非劣效性边界的概念来解释汇总指标之间的差异,并且为了公平地比较汇总指标,应该使用相同的分析方法,在相同的假设下,来估计不同的汇总指标。本文的目的是在三种常用的汇总指标之间进行这种比较:风险比、受限平均生存时间的差异和固定时间点的生存差异。此外,我们旨在研究在使用假设比例风险的分析方法对使用三种汇总指标中的任何一种设计的试验的操作特征的影响。
我们在比例风险设置中进行了一项模拟研究。我们非参数地估计受限平均生存时间的差异和生存差异,而不假设比例风险。我们还在比例风险假设下,使用灵活的生存回归进行参数估计,以估计所有三种指标。
在比较假设比例风险的风险比与不假设比例风险的其他汇总指标时,相对性能在很大程度上取决于具体情况。固定汇总指标,与使用非参数方法相比,假设比例风险始终会导致功效大幅提高。固定建模方法,假设比例风险的灵活参数回归,在考虑的指标中,受限平均生存时间差异通常是最有力的汇总指标。
当风险很可能大致成比例时,在分析中反映这一点可能会导致受限平均生存时间差异和生存差异的功效大幅提高。对于具有时间事件结局的非劣效性试验,应根据临床依据选择汇总指标;当这里讨论的三种汇总指标中的任何一种同样合理时,受限平均生存时间差异通常与最有力的检验相关联,条件是它是在比例风险下估计的。