Sloan Jeff A, Sargent Daniel J, Novotny Paul J, Decker Paul A, Marks Randolph S, Nelson Heidi
Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.
Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.
J Pain Symptom Manage. 2014 Jun;47(6):1091-1099.e3. doi: 10.1016/j.jpainsymman.2013.07.016. Epub 2013 Nov 15.
Quality-adjusted life year (QALY) estimation is a well-known but little used technique to compare survival adjusted for complications. Lack of calibration and interpretation guidance hinders implementation of QALY analyses.
We conducted simulation studies to assess the impact of differences in survival, toxicity rates, and utility values on QALY results.
Survival comparisons used both log-rank and Wilcoxon testing. We examined power considerations for a North Central Cancer Treatment Group Phase III lung cancer clinical trial (89-20-52).
Sample sizes of 100 events per treatment have low power to generate a statistically significant difference in QALYs unless the toxicity rate is 44% higher in one arm. For sample sizes of 200 per arm and equal survival times, toxicity needs to be at least 38% more in one arm for the result to be statistically significant, using a utility of 0.3 for days with toxicity. Sample sizes of 300 (500)/arm provide 80% power if there is a 31% (25%) toxicity difference. If the overall survival hazard ratio between the two treatment arms is 1.25, then samples of at least 150 patients and 13% increased toxicity are necessary to have 80% power to detect QALY differences. In study 89-20-52, there was only 56% power to determine the statistical significance of the observed QALY differences, clarifying the enigmatic conclusion of no statistically significant difference in QALY despite an observed 14.5% increase in toxicity between treatments.
This calibration allows researchers to interpret the clinical significance of QALY analyses and facilitates QALY inclusion in clinical trials through improved study design.
质量调整生命年(QALY)估算是一种比较经并发症调整后的生存率的知名但较少使用的技术。缺乏校准和解释指南阻碍了QALY分析的实施。
我们进行了模拟研究,以评估生存率、毒性率和效用值的差异对QALY结果的影响。
生存率比较采用对数秩检验和Wilcoxon检验。我们研究了北中部癌症治疗组III期肺癌临床试验(89 - 20 - 52)的效能考量。
除非一组的毒性率比另一组高44%,否则每组100个事件的样本量在产生QALY的统计学显著差异方面效能较低。对于每组200个样本量且生存时间相等的情况,若毒性天数的效用值为0.3,一组的毒性至少要比另一组高38%,结果才具有统计学显著性。如果每组样本量为300(500),当毒性差异为31%(25%)时,效能为80%。如果两个治疗组之间的总生存风险比为1.25,那么至少需要150名患者的样本且毒性增加13%,才能有80%的效能检测到QALY差异。在89 - 20 - 52研究中,确定观察到的QALY差异的统计学显著性的效能仅为56%,这解释了尽管观察到治疗之间毒性增加了14.5%,但QALY却无统计学显著差异这一令人费解的结论。
这种校准使研究人员能够解释QALY分析的临床意义,并通过改进研究设计促进在临床试验中纳入QALY。