West Midlands Health Technology Assessment Collaboration (WMHTAC), Unit of Public Health, Epidemiology and Biostatistics, University of Birmingham, West Midlands, UK.
Pharmacoeconomics. 2011 Oct;29(10):827-37. doi: 10.2165/11585940-000000000-00000.
The UK National Institute for Health and Clinical Excellence (NICE) has used its Single Technology Appraisal (STA) programme to assess several drugs for cancer. Typically, the evidence submitted by the manufacturer comes from one short-term randomized controlled trial (RCT) demonstrating improvement in overall survival and/or in delay of disease progression, and these are the pre-eminent drivers of cost effectiveness. We draw attention to key issues encountered in assessing the quality and rigour of the manufacturers' modelling of overall survival and disease progression. Our examples are two recent STAs: sorafenib (Nexavar®) for advanced hepatocellular carcinoma, and azacitidine (Vidaza®) for higher-risk myelodysplastic syndromes (MDS). The choice of parametric model had a large effect on the predicted treatment-dependent survival gain. Logarithmic models (log-Normal and log-logistic) delivered double the survival advantage that was derived from Weibull models. Both submissions selected the logarithmic fits for their base-case economic analyses and justified selection solely on Akaike Information Criterion (AIC) scores. AIC scores in the azacitidine submission failed to match the choice of the log-logistic over Weibull or exponential models, and the modelled survival in the intervention arm lacked face validity. AIC scores for sorafenib models favoured log-Normal fits; however, since there is no statistical method for comparing AIC scores, and differences may be trivial, it is generally advised that the plausibility of competing models should be tested against external data and explored in diagnostic plots. Function fitting to observed data should not be a mechanical process validated by a single crude indicator (AIC). Projective models should show clear plausibility for the patients concerned and should be consistent with other published information. Multiple rather than single parametric functions should be explored and tested with diagnostic plots. When trials have survival curves with long tails exhibiting few events then the robustness of extrapolations using information in such tails should be tested.
英国国家卫生与临床优化研究所(NICE)利用其单一技术评估(STA)计划评估了几种癌症药物。通常,制造商提交的证据来自一项短期随机对照试验(RCT),该试验证明了总生存率和/或疾病进展延迟的改善,这些是成本效益的主要驱动因素。我们提请注意在评估制造商对总生存率和疾病进展建模的质量和严谨性时遇到的关键问题。我们的例子是最近的两项 STA:索拉非尼(Nexavar®)治疗晚期肝细胞癌,阿扎胞苷(Vidaza®)治疗高危骨髓增生异常综合征(MDS)。参数模型的选择对预测治疗依赖性生存获益有很大影响。对数模型(对数正态和对数逻辑)比威布尔模型提供了两倍的生存优势。这两个提交都选择对数拟合进行基础经济分析,并仅根据赤池信息量准则(AIC)评分来证明选择的合理性。阿扎胞苷提交的 AIC 评分未能与对数逻辑优于威布尔或指数模型的选择相匹配,干预组的模型生存率缺乏真实性。索拉非尼模型的 AIC 评分有利于对数正态拟合;然而,由于没有用于比较 AIC 评分的统计方法,并且差异可能微不足道,因此通常建议应根据外部数据和诊断图测试竞争性模型的合理性。拟合观察数据的函数不应是通过单个粗糙指标(AIC)验证的机械过程。投影模型应针对相关患者表现出明显的合理性,并应与其他已发表的信息一致。应探索和测试多个而不是单个参数函数,并使用诊断图进行测试。当试验的生存曲线具有长尾且事件较少时,应测试使用此类长尾中的信息进行外推的稳健性。