Johnson & Johnson Pharmaceutical Research and Development LLC, Titusville, NJ 08560, USA.
Stat Med. 2012 Dec 30;31(30):4401-15. doi: 10.1002/sim.5620. Epub 2012 Sep 27.
Expanded availability of observational healthcare data (both administrative claims and electronic health records) has prompted the development of statistical methods for identifying adverse events associated with medical products, but the operating characteristics of these methods when applied to the real-world data are unknown.
We studied the performance of eight analytic methods for estimating of the strength of association-relative risk (RR) and associated standard error of 53 drug-adverse event outcome pairs, both positive and negative controls. The methods were applied to a network of ten observational healthcare databases, comprising over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of methods at RR thresholds achieving statistical significance of p < 0.05 or p < 0.001 and with absolute threshold RR > 1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC).
Although no specific method demonstrated superior performance, the aggregate results provide a benchmark and baseline expectation for risk identification method performance. At traditional levels of statistical significance (RR > 1, p < 0.05), all methods have a false positive rate >18%, with positive predictive value <38%. The best predictive model, high-dimensional propensity score, achieved an AUC = 0.77. At 50% sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods ranged from 9% to 33%.
Systematic processes for risk identification can provide useful information to supplement an overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying false positive associations.
观测性医疗保健数据(包括行政索赔和电子健康记录)的可用性扩大,促使人们开发了用于识别与医疗产品相关的不良事件的统计方法,但这些方法应用于真实世界数据时的运行特征尚不清楚。
我们研究了八种分析方法在估计 53 种药物不良事件结局对的关联强度-相对风险(RR)及其相关标准误中的表现,这些方法包括阳性和阴性对照。这些方法应用于由十个观测性医疗保健数据库组成的网络,该网络包含超过 1.3 亿个生命。性能指标包括在 RR 阈值达到统计学意义(p < 0.05 或 p < 0.001)和绝对 RR > 1.5 时方法的敏感性、特异性和阳性预测值,以及无阈值的衡量标准,如接收者操作特征曲线下面积(AUC)。
虽然没有特定的方法表现出优越的性能,但综合结果为风险识别方法性能提供了基准和基线预期。在传统的统计学显著性水平(RR > 1,p < 0.05)下,所有方法的假阳性率均> 18%,阳性预测值< 38%。最佳预测模型——高维倾向评分,获得了 AUC = 0.77。在 50%敏感性时,假阳性率范围为 16%至 30%。在 10%的假阳性率时,方法的敏感性范围为 9%至 33%。
风险识别的系统过程可以提供有用的信息来补充全面的安全性评估,但对方法性能的评估表明,识别假阳性关联的可能性很大。