Ebbutt A F, Frith L
European Clinical Statistics, Glaxo Wellcome, Middlesex, U.K.
Stat Med. 1998;17(15-16):1691-701. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1691::aid-sim971>3.0.co;2-j.
Equivalence trials aim to show that two treatments have equivalent therapeutic effects. The approach is to define, in advance, a range of equivalence -d to +d for the treatment difference such that any value in the range is clinically unimportant. If the confidence interval for the difference, calculated after the trial, lies entirely within the interval, then equivalence is claimed. Glaxo Wellcome has carried out a series of trials using this methodology to assess new formulations of inhaled beta-agonists and inhaled steroids in asthma. Eleven of these trials are used to review some practical issues in equivalence trials. For the series of asthma trials, a range for peak expiratory flow rate (PEF) from -15 to +15 l/min was chosen to be the range of equivalence. This fitted well with physicians' opinions and with previously demonstrated differences between active and placebo. The choice of the size of the confidence interval should depend on the medical severity of the clinical endpoints under consideration and the level of risk acceptable in assuming equivalence if a difference of potential importance exists. From this point of view, a recommendation in the CPMP Note for Guidance on Biostatistics that 95 per cent confidence intervals should be used is inappropriate. Intent-to-treat (ITT) and per-protocol (PP) analyses were compared for the eleven asthma trials. Confidence intervals were always wider for the PP analysis and this was entirely due to the smaller number of subjects included in the PP analysis. There was no evidence that the ITT analyses were more conservative in their estimates of treatment difference. The need to demonstrate equivalence in both an ITT and a PP analysis in a regulatory trial increases the regulatory burden on drug developers. The relative importance of the two analyses will depend on the definitions used in particular therapeutic areas. Demonstrating equivalence in one population with strong support from the other would be preferred from the Industry viewpoint. In trials with regulatory importance, prior agreement with regulators on the role of ITT and PP populations should be sought. Trial designs will need to take account of the estimated size of the PP population if adequate power is needed for both analyses. Careful design in the series of asthma trials, particularly identifying a population of patients with potential to improve, resulted in notable increases in lung function during the course of the trials for both treatments. This provided reassurance that equivalence was not due to a lack of efficacy for both treatments. In one trial equivalence was demonstrated overall but a treatment by country interaction was noted. However, this interaction could not be attributed to differences in patient characteristics or baseline data between the countries. Study conduct was also similar in the different countries. The conclusion was that the interaction was spurious and that the trial provided good evidence of equivalence.
等效性试验旨在表明两种治疗方法具有等效的治疗效果。其方法是预先定义治疗差异的等效范围为 -d 至 +d,使得该范围内的任何值在临床上都不重要。如果试验后计算出的差异置信区间完全落在该区间内,则可宣称等效。葛兰素威康公司已采用这种方法进行了一系列试验,以评估哮喘吸入型β受体激动剂和吸入型类固醇的新制剂。其中11项试验用于探讨等效性试验中的一些实际问题。对于这一系列哮喘试验,呼气峰值流速(PEF)的等效范围选择为 -15 至 +15 升/分钟。这与医生的意见以及先前证明的活性药物与安慰剂之间的差异非常吻合。置信区间大小的选择应取决于所考虑的临床终点的医学严重程度以及在存在潜在重要差异时假设等效性可接受的风险水平。从这个角度来看,欧洲药品管理局生物统计学指南中建议使用95%置信区间是不合适的。对这11项哮喘试验的意向性分析(ITT)和符合方案分析(PP)进行了比较。PP分析的置信区间总是更宽,这完全是由于PP分析中纳入的受试者数量较少。没有证据表明ITT分析在治疗差异估计方面更为保守。在监管试验中需要在ITT和PP分析中都证明等效性,这增加了药物开发者的监管负担。两种分析的相对重要性将取决于特定治疗领域中使用的定义。从行业角度来看,在一个人群中证明等效性并得到另一个人群的有力支持会更可取。在具有监管重要性的试验中,应事先与监管机构就ITT和PP人群的作用达成一致。如果两种分析都需要足够的检验效能,试验设计将需要考虑PP人群的估计规模。在这一系列哮喘试验中进行了精心设计,特别是确定了有改善潜力的患者群体,结果两种治疗在试验过程中肺功能都有显著提高。这让人放心等效性并非由于两种治疗都缺乏疗效所致。在一项试验中总体上证明了等效性,但注意到了治疗与国家之间的交互作用。然而,这种交互作用不能归因于不同国家患者特征或基线数据的差异。不同国家的研究实施情况也相似。结论是这种交互作用是虚假的,该试验提供了你好等效性的有力证据。