Sayers Adrian, Crowther Michael J, Judge Andrew, Whitehouse Michael R, Blom Ashley W
Muscloskeletal Research Unit, School of Clinical Sciences, University of Bristol, Bristol, UK.
School of Social and Community Medicine, University of Bristol, Bristol, UK.
BMJ Open. 2017 Aug 28;7(8):e015397. doi: 10.1136/bmjopen-2016-015397.
The use of benchmarks to assess the performance of implants such as those used in arthroplasty surgery is a widespread practice. It provides surgeons, patients and regulatory authorities with the reassurance that implants used are safe and effective. However, it is not currently clear how or how many implants should be statistically compared with a benchmark to assess whether or not that implant is superior, equivalent, non-inferior or inferior to the performance benchmark of interest.We aim to describe the methods and sample size required to conduct a one-sample non-inferiority study of a medical device for the purposes of benchmarking.
Simulation study.
Simulation study of a national register of medical devices.
We simulated data, with and without a non-informative competing risk, to represent an arthroplasty population and describe three methods of analysis (z-test, 1-Kaplan-Meier and competing risks) commonly used in surgical research.
We evaluate the performance of each method using power, bias, root-mean-square error, coverage and CI width.
1-Kaplan-Meier provides an unbiased estimate of implant net failure, which can be used to assess if a surgical device is non-inferior to an external benchmark. Small non-inferiority margins require significantly more individuals to be at risk compared with current benchmarking standards.
A non-inferiority testing paradigm provides a useful framework for determining if an implant meets the required performance defined by an external benchmark. Current contemporary benchmarking standards have limited power to detect non-inferiority, and substantially larger samples sizes, in excess of 3200 procedures, are required to achieve a power greater than 60%. It is clear when benchmarking implant performance, net failure estimated using 1-KM is preferential to crude failure estimated by competing risk models.
使用基准来评估诸如关节置换手术中使用的植入物的性能是一种普遍做法。它为外科医生、患者和监管机构提供了所用植入物安全有效的保证。然而,目前尚不清楚应如何以及对多少植入物与基准进行统计学比较,以评估该植入物是否优于、等同于、不劣于或劣于感兴趣的性能基准。我们旨在描述为进行医疗器械的单样本非劣效性研究以进行基准测试所需的方法和样本量。
模拟研究。
对国家医疗器械注册库的模拟研究。
我们模拟了有无非信息性竞争风险的数据,以代表关节置换人群,并描述了手术研究中常用的三种分析方法(z检验、1-Kaplan-Meier法和竞争风险法)。
我们使用功效、偏差、均方根误差、覆盖率和置信区间宽度来评估每种方法的性能。
1-Kaplan-Meier法提供了植入物净失败率的无偏估计,可用于评估手术器械是否不劣于外部基准。与当前的基准测试标准相比,小的非劣效性边界需要显著更多的个体处于风险中。
非劣效性测试范式为确定植入物是否符合外部基准定义的所需性能提供了一个有用的框架。当前的当代基准测试标准检测非劣效性的能力有限,需要超过3200例手术的大幅更大样本量才能实现大于60%的功效。在对植入物性能进行基准测试时,很明显使用1-KM法估计的净失败率优于竞争风险模型估计的粗略失败率。