Edland Steven D, Ard M Colin, Li Weiwei, Jiang Lingjing
Alzheimers Dement (N Y). 2017 Jun;3(2):213-218. doi: 10.1016/j.trci.2016.12.004. Epub 2017 Jan 23.
Composite scales have recently been proposed as outcome measures for clinical trials. For example, the Prodromal Alzheimer's Cognitive Composite (PACC) is the sum of z-score normed component measures assessing episodic memory, timed executive function, and global cognition. Alternative methods of calculating composite total scores using the weighted sum of the component measures that maximize signal-to-noise of the resulting composite score have been proposed. Optimal weights can be estimated from pilot data, but it is an open question how large a pilot trial is required to calculate reliably optimal weights.
In this manuscript, we describe the calculation of optimal weights, and use large-scale computer simulations to investigate the question of how large a pilot study sample is required to inform the calculation of optimal weights. The simulations are informed by the pattern of decline observed in cognitively normal subjects enrolled in the Alzheimer's Disease Cooperative Study (ADCS) Prevention Instrument cohort study, restricting to n=75 subjects age 75 and over with an ApoE E4 risk allele and therefore likely to have an underlying Alzheimer neurodegenerative process.
In the context of secondary prevention trials in Alzheimer's disease, and using the components of the PACC, we found that pilot studies as small as 100 are sufficient to meaningfully inform weighting parameters. Regardless of the pilot study sample size used to inform weights, the optimally weighted PACC consistently outperformed the standard PACC in terms of statistical power to detect treatment effects in a clinical trial. Pilot studies of size 300 produced weights that achieved near-optimal statistical power, and reduced required sample size relative to the standard PACC by more than half.
These simulations suggest that modestly sized pilot studies, comparable to that of a phase 2 clinical trial, are sufficient to inform the construction of composite outcome measures. Although these findings apply only to the PACC in the context of prodromal AD, the observation that weights only have to approximate the optimal weights to achieve near-optimal performance should generalize. Performing a pilot study or phase 2 trial to inform the weighting of proposed composite outcome measures is highly cost-effective. The net effect of more efficient outcome measures is that smaller trials will be required to test novel treatments. Alternatively, second generation trials can use prior clinical trial data to inform weighting, so that greater efficiency can be achieved as we move forward.
复合量表最近被提议作为临床试验的疗效指标。例如,前驱期阿尔茨海默病认知复合量表(PACC)是评估情景记忆、定时执行功能和整体认知的标准化z分数分量测量值之和。有人提出了使用分量测量值的加权和来计算复合总分的替代方法,以使所得复合分数的信噪比最大化。最佳权重可从预试验数据中估计,但需要多大规模的预试验才能可靠地计算出最佳权重仍是一个悬而未决的问题。
在本论文中,我们描述了最佳权重的计算方法,并使用大规模计算机模拟来研究需要多大规模的预试验样本才能为最佳权重的计算提供依据这一问题。模拟是根据阿尔茨海默病协作研究(ADCS)预防工具队列研究中认知正常受试者的衰退模式进行的,研究对象限制为n = 75名年龄在75岁及以上且携带载脂蛋白E4风险等位基因的受试者,因此这些受试者可能存在潜在的阿尔茨海默神经退行性病变过程。
在阿尔茨海默病二级预防试验的背景下,使用PACC的各分量,我们发现小至100例的预试验就足以有意义地为权重参数提供依据。无论用于确定权重的预试验样本量大小如何,在临床试验中检测治疗效果的统计效力方面,最佳加权的PACC始终优于标准PACC。300例样本量的预试验得出的权重实现了接近最佳的统计效力,并使相对于标准PACC所需的样本量减少了一半以上。
这些模拟表明,规模适中的预试验(类似于2期临床试验)足以指导复合疗效指标的构建。尽管这些发现仅适用于前驱期AD背景下的PACC,但权重只需接近最佳权重就能实现接近最佳性能这一观察结果应该具有普遍性。进行预试验或2期试验以指导所提议的复合疗效指标的加权是非常具有成本效益的。更有效的疗效指标的净效应是,测试新疗法所需的试验规模将更小。或者,第二代试验可以使用先前临床试验数据来指导加权,以便在未来能实现更高的效率。